Skip to content

Cumulative feature and defect updates from recent Transformers PRs#42

Open
evalstate wants to merge 1808 commits intomainfrom
features-and-defects-750
Open

Cumulative feature and defect updates from recent Transformers PRs#42
evalstate wants to merge 1808 commits intomainfrom
features-and-defects-750

Conversation

@evalstate
Copy link
Copy Markdown
Owner

@evalstate evalstate commented Apr 29, 2026

Cumulative feature + defect PR mergeability branch

This PR is generated by the all-defects mergeability flow, run with both defect and feature categories enabled. It accumulates recent open huggingface/transformers PRs that could be merged, cherry-picked, patched, or identified as already present on the current base.

  • Source branch: features-and-defects-750
  • Base: evalstate/transformers:main
  • Head: 199afea711
  • PRs classified: 772
  • PRs with terminal state: 772
  • Accepted (merged / applied / already_present): 393
  • Validation failures reverted: 19
  • Aborted as impractical/codebase moved on: 213
  • Skipped categories: 147

Status counts

  • aborted: 213
  • already_present: 72
  • applied: 67
  • merged: 254
  • skipped: 147
  • validation_failed: 19

Category counts

  • defect: 299
  • documentation: 97
  • feature: 326
  • other: 50

Validation note

The run used the configured lightweight validation profile (compileall, repo checkers, and impacted-test selection when available). Several later batches report validation as unavailable because the baseline/light-validation helper timed out or failed independently of the candidate PR.

The generated report is posted below as a PR comment. Raw local artifacts are:

  • .mergeability/defect-merge-state.jsonl
  • .mergeability/pr-classifications.jsonl
  • all-defects-report.md

evalstate and others added 30 commits April 29, 2026 06:17
# Conflicts:
#	src/transformers/conversion_mapping.py
# Conflicts:
#	src/transformers/initialization.py
# Conflicts:
#	src/transformers/tokenization_utils_tokenizers.py
# Conflicts:
#	src/transformers/models/colqwen2/modeling_colqwen2.py
#	src/transformers/models/colqwen2/modular_colqwen2.py
#	src/transformers/models/paligemma/modeling_paligemma.py
# Conflicts:
#	tests/test_tokenization_common.py
# Conflicts:
#	src/transformers/models/pi0/image_processing_pi0.py
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
# Conflicts:
#	src/transformers/models/auto/configuration_auto.py
#	src/transformers/models/auto/image_processing_auto.py
#	src/transformers/models/auto/modeling_auto.py
# Conflicts:
#	src/transformers/models/colqwen2/modeling_colqwen2.py
#	src/transformers/models/colqwen2/modular_colqwen2.py
# Conflicts:
#	src/transformers/integrations/flex_attention.py
#	src/transformers/utils/generic.py
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
# Conflicts:
#	src/transformers/models/qwen3_5/modeling_qwen3_5.py
# Conflicts:
#	src/transformers/conversion_mapping.py
#	src/transformers/models/auto/configuration_auto.py
#	src/transformers/models/auto/image_processing_auto.py
#	utils/check_repo.py
# Conflicts:
#	src/transformers/pipelines/audio_classification.py
#	src/transformers/pipelines/automatic_speech_recognition.py
#	tests/pipelines/test_pipelines_automatic_speech_recognition.py
# Conflicts:
#	src/transformers/utils/generic.py
Applied from upstream PR huggingface#39103 (1d85c39) with local patch because the direct PR merge conflicts with current Gemma3n v5/strict config docstring structure.
# Conflicts:
#	src/transformers/loss/loss_utils.py
# Conflicts:
#	src/transformers/models/kosmos2/modeling_kosmos2.py
# Conflicts:
#	src/transformers/quantizers/quantizer_bnb_4bit.py
#	src/transformers/quantizers/quantizer_bnb_8bit.py
# Conflicts:
#	src/transformers/models/llama4/convert_llama4_weights_to_hf.py
@evalstate evalstate changed the title Cumulative feature and defect fixes from recent Transformers PRs Cumulative feature and defect updates from recent Transformers PRs Apr 29, 2026
@evalstate
Copy link
Copy Markdown
Owner Author

Generated mergeability report

All defects cumulative merge report

Run context

  • Repo: huggingface/transformers
  • Worktree: /home/ssmith/source/mergeability-test/.mergeability/defect-worktrees/features-and-defects-750
  • Branch: features-and-defects-750
  • Base ref: upstream/main (a8f43eca15b8d1c63deb33f6b97dfab30419e5da when created)
  • PR list: .mergeability/recent-prs.jsonl (open PRs)
  • Candidate categories: defect, feature
  • Validation commands:
    • . .mergeability/validation-venv/bin/activate && PYTHONPATH=src python -m compileall -q src/transformers
    • . .mergeability/validation-venv/bin/activate && PYTHONPATH=src python utils/checkers.py ruff_check,ruff_format,init_isort,sort_auto_mappings
    • . .mergeability/validation-venv/bin/activate && .mergeability/run-light-validation.sh

Record counts

  • Recent PRs in input list: 750
  • Classification records: 772
  • Merge-state records: 772
  • Recent PRs with terminal state: 750 / 750

Cumulative classifications

  • defect: 299
  • feature: 326
  • documentation: 97
  • other: 50

Cumulative merge states

  • merged: 254
  • applied: 67
  • already_present: 72
  • aborted: 213
  • validation_failed: 19
  • skipped: 147

Baseline validation for this turn

No baseline validation was run in this final turn because the only remaining unprocessed PR (huggingface#38786) was classified as documentation and skipped. The “before first merge attempt” requirement was therefore not triggered. The previous turn's baseline remains the latest available baseline: compileall passed, while checker/light validation were unavailable due pre-existing cumulative worktree issues and untracked/generated files blocking tests_fetcher.py.

PRs processed this turn

  1. PR Provide clearer instructions on how to specify target language. huggingface/transformers#38786 — documentation — skipped: documentation-only clarification for MADLAD-400 target language tokens; category is not configured for this cumulative branch.

Newly merged/applied PRs this turn

None.

Validation results this turn

No merge was attempted and no validation was required or run this turn.

Recent merged/applied PRs

Recent aborted PRs

Resume state

  • Processed this turn: 1
  • Current consecutive abort count: 4
  • Next PR to process: none
  • Termination reason: complete — all PRs in .mergeability/recent-prs.jsonl have terminal state records.
runStatus:
  stopKind: complete
  processedThisTurn: 1
  nextPr: null
  consecutiveAborts: 4

@evalstate
Copy link
Copy Markdown
Owner Author

Feature + defect flow status

Processed terminal records: 772

Status counts

Status Count
aborted 213
already_present 72
applied 67
merged 254
skipped 147
validation_failed 19

Category counts

Category Count
defect 299
documentation 97
feature 326
other 50

Merged / applied / already-present records (393)

PR Category Status Method Validation Original PR summary / goal Merge note
huggingface#45692 defect merged merge passed [Fix Phi4 test] Add option to override image_processor_auto_map with local code when trust_remote_code is True merged cleanly; validation passed with bounded light pytest targets
huggingface#45691 defect merged merge passed [serve] cb error merged cleanly; validation passed with bounded light pytest targets
huggingface#45690 feature merged merge passed [serve] Support for reasoning merged with adjacent class conflict resolved between serve error and reasoning helpers; validation passed
huggingface#45687 defect merged merge passed fix: Made histc_input robust for broader hardware merged cleanly; validation passed with bounded light pytest targets
huggingface#45686 defect merged merge passed Fix custom-module copies inheriting read-only permissions merged cleanly; validation passed with bounded light pytest targets
huggingface#45683 defect merged merge passed Exclude audio modules from conversion process merged cleanly; ruff fixed whitespace/formatting in changed file and amended merge commit; validation passed
huggingface#45682 defect merged merge passed FIX Restore LoRA hotswapping functionality merged cleanly; validation passed with bounded light pytest targets
huggingface#45681 defect already_present none not_run Restore TokenizersBackend override for DeepSeek V3/R1 tokenizer dispatch PR conflicted only because current cumulative branch already contains equivalent TokenizersBackend override and DeepSeek regression coverage; no code change ne…
huggingface#45678 defect merged merge passed Fix shared config mutation issue in flash_attn_from_config merged cleanly; validation passed with bounded light pytest targets
huggingface#45671 defect merged merge passed Update latest revision for Phi-4-multimodal test merged cleanly; validation passed with bounded light pytest targets
huggingface#45670 defect merged merge passed [nit] glmasr should be in AutoModelForMultimodalLM merged cleanly; validation passed with bounded light pytest targets
huggingface#45668 feature merged merge passed [GGUF] Add support for Qwen3.5 MoE (qwen35moe arch) merged cleanly; validation passed with bounded light pytest targets
huggingface#45662 defect merged merge passed Fix EP + FSDP2: experts silently overwritten by rank-0 broadcast merged cleanly; validation passed with bounded light pytest targets
huggingface#45661 feature merged merge passed [Weight Converter] More fine-grained mappings on classes, scoping for every transforms (including weight converter) merged cleanly; validation passed with bounded light pytest targets
huggingface#45654 feature merged merge passed [CB] Refactor any model-related code in a separate class merged cleanly; validation passed with bounded light pytest targets
huggingface#45653 feature already_present none not_run [CB] Better overall script and decode bucketting PR head is already an ancestor of the cumulative branch after merging PR huggingface#45654; no code change needed
huggingface#45651 feature merged merge passed [Trainer] Optimize LengthGroupedSampler computation with select_columns and tqdm merged cleanly; validation passed with bounded light pytest targets
huggingface#45649 defect merged merge passed Fix OOM regression for FSDP2 + cpu_ram_efficient_loading on large models merged cleanly; validation passed with bounded light pytest targets
huggingface#45645 defect merged merge passed Fix xdist collisions for captured_info artifacts and preserve CI debug logs merged cleanly; validation passed with bounded light pytest targets
huggingface#45643 feature merged merge passed Add DeepSeek V4 merged with conversion_mapping conflict resolved by preserving existing LlavaModel mapping and adding DeepSeek V4 mapping; ruff fixed import order and validati…
huggingface#45642 defect merged merge passed Fix trust_remote_code local cache collisions for local models (huggingface#45632) merged with test-file conflict resolved by preserving prior custom_object_save regression test and adding local-cache collision tests; validation passed
huggingface#45640 feature merged merge passed 🚨🚨🚨 [Trainer] Default to FSDP2, simplify API around fsdp + fsdp_config merged with trainer FSDP setup conflict resolved by preserving EP ignored-module setup while taking FSDP2 defaults; validation passed
huggingface#45639 defect already_present none not_run Make patched testing debug logs xdist-safe PR conflicted only with the previously merged huggingface#45645, which already provides xdist-safe captured_info paths, cleanup, and tests; no code change needed
huggingface#45638 feature merged merge passed Add Multi-Token Prediction (MTP) support for Qwen3.5 merged cleanly; validation passed with bounded light pytest targets
huggingface#45635 defect merged merge passed qa: speed up dtype regex weight load + reduce dtype tests to 3 random merged with prefix-handling conflict resolved by preserving interleaved conversion transforms and applying optimized prefix checks; ruff formatted changed file…
huggingface#45634 feature merged merge passed DeepGEMM BF16, isolation, refactor merged with moe conflict resolved by preserving DTensor local unwrapping and using a clamped bias index while keeping sentinel detection; validation passed
huggingface#45630 feature merged merge passed Add new model: Kimi2-6 merged cleanly; ruff formatted Kimi2-6 changed files and validation passed with bounded light pytest targets
huggingface#45627 defect merged merge passed Processing Utils: honor pre-built sub-processor kwargs in from_pretrained merged cleanly; validation passed with bounded light pytest targets
huggingface#45626 feature merged merge passed [Model] Add PP-FormulaNet Model Support merged cleanly; validation passed with bounded light pytest targets
huggingface#45621 feature merged merge passed Better Grouped GEMM + EP merged cleanly; validation passed with bounded light pytest targets
huggingface#45618 feature merged merge passed Add MTP speculative decoding via MTPCandidateGenerator merged cleanly; validation passed with bounded light pytest targets
huggingface#45615 defect merged merge passed fix(qianfan_ocr): add XPU expectations merged cleanly; validation passed with bounded light pytest targets
huggingface#45614 defect merged merge passed Add missing requests dependency to transformers[serving] merged cleanly; ruff formatted setup.py and validation passed with bounded light pytest targets
huggingface#45613 feature merged merge passed [New Model] Add MiniCPM3 support merged with auto-mapping conflicts resolved by preserving MiniCPMV4_6 entries and adding MiniCPM3 entries; validation passed
huggingface#45609 feature merged merge passed make it possible to ser/deser HF MoE models with torchao merged cleanly; validation passed with bounded light pytest targets
huggingface#45599 feature merged merge passed qa: more lazy loading merged cleanly; validation passed with bounded light pytest targets
huggingface#45597 feature merged merge passed Add Granite 4.1 Vision (granite4_vision) merged cleanly; validation passed with bounded light pytest targets
huggingface#45596 defect merged merge passed fix 2 failed test cases for blt model on XPU merged cleanly; validation passed with bounded light pytest targets
huggingface#45594 defect merged merge passed fix(utils): Resolve backbone utils test regressions merged cleanly; validation passed with bounded light pytest targets
huggingface#45591 defect merged merge passed [nemotron_h] respect _no_reinit flag on dt_bias and out_proj.weight merged cleanly; validation passed with bounded light pytest targets
huggingface#45586 feature merged merge passed Add Audio-Visual Flamingo model merged cleanly; validation passed with bounded light pytest targets
huggingface#45578 defect merged merge passed Remove attribute_map from GptOssConfig merged cleanly; validation passed with bounded light pytest targets
huggingface#45570 defect merged merge passed Fix whisper long-form generation when eos_token_id is a list merged cleanly; validation passed with bounded light pytest targets
huggingface#45568 defect merged merge passed Gemma4: fix failed test cases merged cleanly; validation passed with bounded light pytest targets
huggingface#45552 defect merged merge passed Remove warnings for modernbert merged cleanly; validation passed with bounded light pytest targets
huggingface#45549 defect merged merge passed fix: apply channel averaging correctly in audio feature extractors merged cleanly; validation passed with bounded light pytest targets
huggingface#45548 defect merged merge passed Fix EP + DeepSpeed ZeRO-3 loading via accelerate launch merged cleanly; removed duplicate has_ep definition caused by overlap with existing branch before rerunning validation, which passed with bounded light pytest …
huggingface#45546 feature merged merge passed feat: Add GGUF loading support for Llama 4 (text) clean merge; validation passed
huggingface#45541 defect merged merge passed Fix local_files_only tokenizer fallback when tokenizer files are missing (Issue 45538) clean merge; validation passed
huggingface#45524 defect merged merge passed utils: handle flash_attn missing from importlib packages_distributions without crashing clean merge; validation passed
huggingface#45523 defect merged merge passed Fix Seq2SeqLM ExecuTorch export: add encoder_attention_mask to decoder and use static encoder shapes clean merge; validation passed
huggingface#45512 defect merged merge passed [OutputRecorder] re.search on layer_name clean merge; validation passed
huggingface#45497 feature merged merge passed Add V-JEPA 2.1 inference support clean merge; validation passed
huggingface#45493 feature merged merge passed Modularize ProcessorMixin into smaller components clean merge; validation passed
huggingface#45490 feature merged merge passed Add ctsm model clean merge; validation passed
huggingface#45487 defect merged merge passed Fix model parallel issue for altclip model and ChineseClip model clean merge; validation passed
huggingface#45477 feature merged merge passed Blockwise mask fn as opt arg in all masking functions clean merge; validation passed
huggingface#45471 feature merged merge passed Add EXAONE 4.5 implementations resolved additive IGNORE_NON_TESTED conflict in utils/check_repo.py; validation passed
huggingface#45438 feature merged merge passed Add Gemma4ForSequenceClassification resolved Gemma4 additive conflicts with prior masking changes; validation passed
huggingface#45423 defect merged merge unavailable Fix void segmentation map label reduction merge clean; compileall and repo style checks passed; light validation unavailable because baseline run-light timed out after 300s
huggingface#45422 defect merged merge unavailable Drop content=None from messages in apply_chat_template merge clean; compileall and repo style checks passed; light validation unavailable because baseline run-light timed out after 300s
huggingface#45413 defect merged merge unavailable Fix EtaLogitsWarper on fully masked logits merge clean; compileall and repo style checks passed; light validation unavailable because baseline run-light timed out after 300s
huggingface#45391 feature merged merge unavailable audio tester class merge clean after mechanical ruff import/format fixes amended into merge; compileall and repo style checks passed; light validation unavailable because baselin…
huggingface#45389 defect merged merge unavailable Require input_ids for repetition penalty merge clean; compileall and repo style checks passed; light validation unavailable because baseline run-light timed out after 300s
huggingface#45379 defect applied cherry-pick passed fix(config): add deepstack_visual_indexes to Qwen3_5MoeVisionConfig cherry-pick clean; compileall and repo style checks passed; light validation selected no pytest targets
huggingface#45378 defect applied cherry-pick passed fix(mistral): guard ReasoningEffort import for older mistral_common versions cherry-pick clean; compileall and repo style checks passed; light validation selected no pytest targets
huggingface#45351 defect applied cherry-pick passed fix(testing_utils): guard get_device_capability with torch.cuda.is_available() cherry-pick clean; compileall and repo style checks passed; light validation ran 203 tests with 107 skipped
huggingface#45350 feature already_present none not_run WIP: Add support for Granite4VisionForConditionalGeneration Granite4Vision support is already present in the cumulative branch; direct merge of the older WIP PR conflicts add/add with existing granite4_vision files and …
huggingface#45346 defect applied cherry-pick passed Fix Double Application of Softmax for Router Logits in MoE models cherry-picked PR-only MoE router logits fixes; compileall and repo style checks passed; light validation selected no pytest targets
huggingface#45342 defect applied cherry-pick passed Use _keys_to_ignore_on_load_unexpected/missing recursively from children cherry-pick clean; compileall and repo style checks passed; light validation ran 203 tests with 107 skipped
huggingface#45333 feature merged merge passed Add heterogeneous config support (per-layer configuration) merged heterogeneous per-layer config support cleanly; validation passed
huggingface#45300 defect already_present none not_run Fix Nemotron-H: add mlp layer type support Nemotron-H mlp layer type support is already present in current configuration/modeling code; direct PR merge only conflicted with renamed layer_types APIs and …
huggingface#45294 feature already_present none not_run feat: add Gemma4ForSequenceClassification Gemma4 sequence classification classes, auto mappings, docs, and tests are already present in the current branch; direct merge conflicted with newer Gemma4 tex…
huggingface#45293 defect applied cherry-pick passed Fix "AttributeError: NewTokenizer has no attribute special_attribute_present" (Remove REGISTERED_FAST_ALIASES) cherry-picked AutoTokenizer registered fast alias cleanup to avoid unrelated merge-parent history; validation passed
huggingface#45273 defect applied cherry-pick passed fix: liger unnecessarily materializes logits in VRAM during eval, causing OOM cherry-picked Liger skip_logits eval OOM fix; validation passed
huggingface#45270 feature applied cherry-pick passed [Trainer] Support multi-loss component logging cherry-picked Trainer multi-loss component logging to avoid merge-parent tests_fetcher timeout; validation passed
huggingface#45233 defect applied patch passed feat: make timesfm2_5 onnx export compatible applied TimesFM ONNX export compatibility diff as a local patch; compile/checkers passed and light validation selected no pytest targets
huggingface#45221 defect applied cherry-pick passed user friendly error when loading audio from video cherry-picked user-friendly video-container audio error and reconciled timeout parameter in audio_utils; validation passed
huggingface#45218 feature applied patch passed Proposal: Agent-first CLI applied agent-first CLI proposal as a local patch; compile/checkers passed and light validation selected no pytest targets
huggingface#45202 defect applied patch passed Fix gemma4 has flash-attention incompatbile head-dim=512 applied minimal Gemma4 FlashAttention disable in modular and generated modeling files; compile/checkers passed and light validation selected no pytest targets
huggingface#45193 defect applied cherry-pick passed Config can apply pyndatic validation without torch-dependence cherry-picked config pydantic validation fix and style cleanup; validation passed
huggingface#45170 defect applied patch passed layrnorm -> layernorm applied layernorm typo fix across CLIP-like vision models; validation passed
huggingface#45168 feature applied cherry-pick passed Update min_lr and max_lr default values to better defaults cherry-picked scheduler default min_lr/max_lr update after direct merge caused light-validation timeout; validation passed
huggingface#45167 feature applied cherry-pick passed Add anthropic style of function schema cherry-picked Anthropic-style tool schema support and return-statement refactor; validation passed
huggingface#45157 feature applied cherry-pick passed [WIP] PrismML Bonsai model support cherry-picked Prism q1_0_g128 GGUF support and applied formatter; validation passed
huggingface#45147 defect applied cherry-pick passed Fix broken HQQ support cherry-picked HQQ integration fixes without PR merge-parent history; validation passed
huggingface#45134 feature applied cherry-pick passed Optimize Parakeet feature extraction on CUDA cherry-picked Parakeet CUDA feature-extraction optimization with small conflict resolution; compile/checkers passed and light validation selected no pytest tar…
huggingface#45128 defect applied cherry-pick passed Fix: handle future annotations in _process_kwargs_parameters cherry-picked future-annotations auto_docstring fix and amended import/format cleanup; validation passed
huggingface#45105 defect already_present none not_run Fix @auto_docstring crash with from future import annotations in _process_kwargs_parameters same auto_docstring future-annotations guard is already present from later PR huggingface#45128 applied on the cumulative branch
huggingface#45086 defect already_present none not_run fix AttributeError in _patch_mistral_regex Mistral regex tokenizer AttributeError fix was already present in the cumulative branch; direct merge produced an empty content change and cherry-pick was empty
huggingface#45082 feature applied cherry-pick passed [VidEoMT] Update conversion script cherry-picked VidEoMT converter update; compile/checkers passed and light validation selected no pytest targets
huggingface#45075 feature merged merge passed Add Deepseek-OCR-2 model merged with mechanical conversion_mapping conflict resolution preserving colqwen2 mapping and adding DeepSeek-OCR-2 mappings; validation passed
huggingface#45040 defect merged merge passed Llama3 video fix merged cleanly; ruff formatted VideoLlama3 test file and validation passed
huggingface#44989 feature merged merge passed 🚨 Distributed training API merged distributed API prototype scripts cleanly; validation passed with bounded light pytest targets
huggingface#44952 defect merged merge unavailable Fix: Add correct return behaviour when output_hidden_states=True for CLIP and SIGLIP vision models merge clean; compileall/checkers passed; light validation unavailable because baseline run-light timed out
huggingface#44940 defect merged merge unavailable Fix tie_weights skipping logic is not tied to model thread scope merged with mechanical initialization.py conflict resolution preserving meta_device_safe_creation_ops and adopting scoped tie-weight suppression; compileall/ch…
huggingface#44923 defect merged merge unavailable fix: avoid unconditional model_info call in _patch_mistral_regex merged with mechanical tokenizer local/offline conflict resolution and amended ruff formatting; compileall/checkers passed; light validation unavailable becaus…
huggingface#44907 defect merged merge unavailable Remove unnecessary expand_as in get_placeholder_mask across VLMs merged with mechanical VLM mask conflicts resolved in ColQwen2/PaliGemma and amended missing Callable import; compileall/checkers passed; light validation unav…
huggingface#44893 defect merged merge unavailable add StaticLayer.crop() to match DynamicLayer API merge clean; compileall/checkers passed; light validation unavailable because baseline run-light timed out
huggingface#44891 feature merged merge unavailable [Trainer] add MoERouterHealthCallback Callback merge clean; compileall/checkers passed; light validation unavailable because baseline run-light timed out
huggingface#44889 defect merged merge unavailable [DeepSpeed] Fix evaluate()/predict() before train() merge clean; compileall/checkers passed; light validation unavailable because baseline run-light timed out
huggingface#45694 defect merged merge unavailable Fix train_batch_size and eval_batch_size to respect split_batches config merged cleanly; amended mechanical ruff formatting in added test; compileall/checkers passed; light validation unavailable because baseline and rerun timed out
huggingface#44836 defect merged merge unavailable Add cu_seqlens support to OlmoHybridGatedDeltaNet for packed sequences merge clean; compileall/checkers passed; light validation unavailable because baseline run-light timed out
huggingface#44827 defect merged merge unavailable Fix Mistral4 tests merge clean; compileall/checkers passed; light validation unavailable because baseline run-light timed out
huggingface#44793 defect merged merge unavailable fix(janus): Handle None values in image generation mode merge clean; compileall/checkers passed; light validation unavailable because baseline run-light timed out
huggingface#44781 defect merged merge unavailable Fix _set_model_specific_special_tokens to accept list-format extra_special_tokens merged with modify/delete conflict resolved by preserving current tests/test_tokenization_common.py while applying tokenizer list-handling fix; compileall/chec…
huggingface#44771 defect merged merge unavailable wtf merged with mechanical image processor backend rename conflict resolved by applying SigLIP inheritance to current generated PI0 image processor; compileall/che…
huggingface#44731 defect merged merge unavailable [Tests] Fix slow video tensor creation from list of numpy arrays in SmolVLM merge clean; compileall/checkers passed; light validation unavailable because baseline and rerun timed out
huggingface#44724 defect merged merge unavailable Fix some missing / incorrect entries in auto files merged with mechanical auto-mapping conflicts resolved by keeping current regenerated mappings and adding mlcd_vision_model special module mapping; compileall/…
huggingface#44713 feature merged merge unavailable [ColQwen2] Refactor output tracing (issue huggingface#43979) merged with mechanical ColQwen2 output_hidden_states conflict resolution against current kwargs-based forward signature; compileall/checkers passed; light vali…
huggingface#44697 defect merged merge unavailable fix: torch_float should return float, not int merged with mechanical conflicts resolved by preserving current flex_attention helper API, applying torch_float float conversion, and adapting Doge flex LSE ca…
huggingface#44680 feature merged merge unavailable Allow kernel modules to declare their preferred mask function merge clean; compileall/checkers passed; light validation unavailable because baseline and rerun timed out
huggingface#44676 defect merged merge unavailable fix(gpt2): Resolve NaN/Inf issue in lm_head on Python 3.13 with tied weights merge clean; compileall/checkers passed; light validation unavailable because baseline and rerun timed out
huggingface#45695 feature merged merge unavailable Support for a new Granite-Speech-Plus model merge clean; compileall/checkers passed; light validation unavailable because baseline and rerun timed out
huggingface#44664 defect merged merge unavailable 🚨 Generic Sequence Classifier works for multimodal models merged with Qwen3.5 generated-file conflict resolved to current modular layout and amended mechanical ruff formatting; compileall/checkers passed; light valida…
huggingface#44662 feature merged merge unavailable [model] Add PenguinVL implementation merged with registry/conversion/check_repo conflicts resolved against current auto_mappings and conversion mapping APIs; compileall/checkers passed; light vali…
huggingface#44660 defect merged merge unavailable Fix: avoid late CUDA OOM in load_best_model_at_end with PEFT models merge clean; compileall/checkers passed; light validation unavailable because baseline and rerun timed out
huggingface#44650 defect merged merge unavailable Fix Seq2SeqTrainer generation path for decoder-only models merge clean; compileall/checkers passed; light validation unavailable because baseline and rerun timed out
huggingface#44641 defect merged merge unavailable Conditinally passing and_mask_function arg to create_causal_mask merge clean; compileall/checkers passed; light validation unavailable because baseline and rerun timed out
huggingface#44635 feature merged merge unavailable [Gemma] Modular-friendly buffers merged with Gemma3n RMSNorm buffer conflict resolved by applying buffer fallback to current modular/generated code; compileall/checkers passed; light validatio…
huggingface#45697 defect already_present none unavailable fix(testing): check torch.cuda.is_available() before get_device_capability direct merge would replay unrelated BLT conflicts, but the CUDA-headless crash fix is already present in current get_device_properties via a torch.cuda.is_avai…
huggingface#44626 defect merged merge unavailable don't break legacy behavior when enforced! merge clean; compileall/checkers passed after rerunning serially; light validation unavailable because baseline and rerun timed out
huggingface#44615 defect merged merge unavailable Restore is_torch_fx_available for trust_remote_code backwards compatibility merge clean; compileall/checkers passed; light validation unavailable because baseline timed out
huggingface#44606 defect merged merge unavailable optionally override tokenizer class with serialized tokenizer merge clean; compileall/checkers passed; light validation unavailable because baseline timed out
huggingface#44603 defect merged merge unavailable fixed dockerfile for arm64 systems merge clean; compileall/checkers passed; light validation unavailable because baseline timed out
huggingface#44594 feature merged merge unavailable [Pipeline] Add top_k, label filtering, box_format and score sorting to ObjectDetectionPipeline merge clean; ruff format applied to changed object_detection.py and amended; compileall/checkers passed; light validation unavailable because baseline timed out
huggingface#44587 defect merged merge unavailable Fix: Handling fused qkv result tensor slicing for tp sharded qkv weights merge clean; ruff format applied to changed Falcon file and amended; compileall/checkers passed; light validation unavailable because baseline timed out
huggingface#44585 defect merged merge unavailable Fix missing rms_norm_eps in DeepseekV3 MLA layernorms merge clean; compileall/checkers passed; light validation unavailable because baseline timed out
huggingface#44569 feature merged merge unavailable Add SarvamMLA model (sarvamai/sarvam-105b) merged with current auto-mappings conflict resolved by adding SarvamMLA to generated auto_mappings and preserving existing PPChart2Table config checker excepti…
huggingface#45699 feature merged merge unavailable Add FP8 kernel acceleration for compressed-tensors quantized models merge clean; formatter/import-order applied and amended; compileall/checkers passed; light validation unavailable because baseline timed out and tests_fetcher …
huggingface#44543 defect merged merge unavailable Fix assistant_masks for multimodal inputs in apply_chat_template merge conflict in adjacent tests resolved by keeping both tests; compileall/checkers passed; light validation unavailable because baseline and rerun timed out
huggingface#44535 defect already_present none unavailable Fix crash in Qwen2_5_VLProcessor when using batched input with padding=False PR targets the old Qwen2_5_VLProcessor.call; current branch has moved mm_token_type_ids generation into ProcessorMixin.create_mm_token_type_ids, which alre…
huggingface#44438 feature merged merge unavailable Add flashoptim merge clean; compileall and repository style checks passed; light validation unavailable because baseline and post-merge run-light-validation timed out
huggingface#44408 feature merged merge unavailable Add option to export encoder hidden states for Granite-speech merge clean; compileall and repository style checks passed; light validation unavailable because baseline and post-merge run-light-validation timed out
huggingface#44385 defect merged merge unavailable Fix make check-repo merge clean; compileall and repository style checks passed; light validation unavailable because baseline and post-merge run-light-validation timed out
huggingface#44369 feature merged merge unavailable Feature/integrations docs fix merge completed with one straightforward documentation conflict resolved in favor of PR formatting; compileall and repository style checks passed; light valida…
huggingface#44348 feature merged merge unavailable Enable MetalConfig to load pre-quantized MLX models from HuggingFace Hub merge completed with one hub.py conflict resolved by preserving current tqdm_class handling and PR MLX shard fallback; compileall and repository style checks p…
huggingface#44270 defect merged merge unavailable Add correct typing to custom images_kwargs in ProcessorsKwargs merge completed with straightforward processor typing conflicts resolved by preserving current processor class names/imports and adding PR images_kwargs annota…
huggingface#44259 feature merged merge unavailable Async data producer merge clean; compileall and repository style checks passed after narrow ruff/init-isort fixes amended into the merge; light validation unavailable because base…
huggingface#44257 defect merged merge unavailable use nanmean for aggregating loss merge clean; compileall and repository style checks passed; light validation unavailable because baseline and post-merge run-light-validation timed out
huggingface#44228 defect merged merge unavailable [Quantisation] account for nested tensors from quantisers merge clean; compileall and repository style checks passed; light validation unavailable because baseline and post-merge run-light-validation timed out
huggingface#44215 feature merged merge unavailable Add sequence classification capability to Granite models merge completed with a single GraniteMoeHybrid test import conflict resolved to include the new sequence-classification class while preserving current imports;…
huggingface#44189 defect merged merge unavailable fix: don't move model to device under other dist train backends merge completed with one Trainer conflict resolved by applying the PR distributed-backend guards to the current fp16/bf16 full-eval device-placement block; com…
huggingface#44184 feature merged merge unavailable feat: add OpenAI CircuitGPT core architecture and sparse linear layers merge clean; compileall and repository style checks passed after narrow init-isort fix amended into the merge; light validation unavailable because baseline an…
huggingface#44171 feature merged merge unavailable Parakeet tdt merge completed with straightforward conflicts in hub kernel mapping and processor auto mapping resolved by preserving current entries and adding the PR tdt-lo…
huggingface#44159 feature already_present none not_run Add SDPA and Flash Attention support for OWL-ViT OWL-ViT attention backend support is already present on the cumulative branch: OwlViTAttention dispatches through ALL_ATTENTION_FUNCTIONS and OwlViTPreTrainedM…
huggingface#44142 feature merged merge unavailable [voxtral-realtime] get more perfs! merge completed with Voxtral Realtime generation conflicts resolved by adding precomputed audio_embeds support while preserving the current past_key_values/pos…
huggingface#44070 feature merged merge unavailable Add GGUF loading support for Qwen3-Next (qwen3_next) architecture merge completed with GGUF conflicts resolved by preserving current Qwen3.5/MiniMax/Llama4 mappings and adding Qwen3-Next entries; compileall and repository sty…
huggingface#44059 feature already_present none not_run [GPT2] Refactor output tracing to use capture_outputs/can_return_tuple decorators GPT-2 output tracing migration is already present on the cumulative branch: GPT2PreTrainedModel has _can_record_outputs, GPT2Model.forward uses capture_outputs…
huggingface#44056 feature merged merge unavailable [MPNet] Refactor output tracing using capture_outputs decorator merge completed with a small MPNet conflict resolved in favor of current config.return_dict handling; compileall and repository style checks passed; light vali…
huggingface#44044 feature merged merge unavailable Refactor DeBERTa output tracing interface merge completed with DeBERTa-v2 output-tracing conflicts resolved by keeping the decorator-based TransformersKwargs path and dropping stale manual return_dict/…
huggingface#44030 feature merged merge unavailable refactor output tracing in dpr merge completed with DPR conflicts resolved by using the decorator-based output tracing path and dropping stale manual output defaults; compileall and reposito…
huggingface#44029 feature merged merge unavailable refactor output tracing in rwkv merge completed with RWKV conflicts resolved by using decorator-based output tracing and dropping stale manual return_dict/default handling; compileall and rep…
huggingface#44028 feature merged merge unavailable refactor output tracing for superpoint merge completed with SuperPoint conflicts resolved by using decorator-based output tracing and dropping stale manual return_dict/hidden-state defaults; compile…
huggingface#44027 feature merged merge unavailable refactor output tracing in merge completed with speech_encoder_decoder output-tracing conflict resolved by keeping decorator-based output handling and dropping stale manual return_dict d…
huggingface#44026 feature merged merge unavailable refactor output tracing for merge completed with vision_encoder_decoder output-tracing conflicts resolved by applying decorator-based output handling while keeping current removal of cach…
huggingface#44025 feature merged merge unavailable refactor output tracing for depth_anything merge completed by applying depth_anything decorator-based output handling while keeping current prompt_depth_anything generated capture_outputs implementation…
huggingface#44024 feature merged merge unavailable Focalnet standardized outputs merge completed with FocalNet output-tracing conflicts resolved by applying the PR capture_outputs/can_return_tuple path while retaining the current backbone h…
huggingface#44019 feature merged merge unavailable Refactor resnet to use capture_outputs/can_return_tuple output tracing merge completed with ResNet output-tracing conflicts resolved by applying capture_outputs/can_return_tuple while retaining current backbone hidden-state filter…
huggingface#44017 feature merged merge unavailable Refactor output tracing in segformers (huggingface#43979) merge completed with a SegFormer image-classification output-tracing conflict resolved by dropping stale manual return_dict defaulting; compileall and reposito…
huggingface#44013 feature merged merge unavailable Ouptut tracing: Standardizing MobileNetv2 merge completed with MobileNetV2 output-tracing conflicts resolved by dropping stale manual return_dict/output_hidden_states defaulting; compileall and reposit…
huggingface#44010 feature merged merge unavailable [SqueezeBert] Migrate to standardized output collection decorators merge completed with SqueezeBert output-tracing conflicts resolved by applying the decorator-based PR side; compileall and repository style checks passed; ligh…
huggingface#44002 feature merged merge unavailable refactor output tracing in upernet merge completed with a UPerNet output-flag conflict resolved by keeping the decorator-based kwargs path; compileall and repository style checks passed; light v…
huggingface#44001 feature merged merge unavailable refactor output tracing in univnet merge completed with UnivNet return_dict conflict resolved by keeping the decorator-based can_return_tuple path; compileall and repository style checks passed;…
huggingface#44000 feature merged merge unavailable refactor output tracing in vision_text_dual_encoder merge completed cleanly for vision_text_dual_encoder output tracing; compileall and repository style checks passed; light validation unavailable because baseli…
huggingface#43999 feature merged merge unavailable refactor output tracing in mobilenet_v1 merge completed with MobileNetV1 output-tracing conflicts resolved by keeping the decorator-based return path; compileall and repository style checks passed; l…
huggingface#43998 feature merged merge unavailable refactor output tracing in timm_backbone merge completed with TimmBackbone output-tracing conflicts resolved by keeping the decorator-based kwargs path and removing stale filter_output_hidden_states/i…
huggingface#43997 feature merged merge unavailable Migrate RegNet to standardized output tracing merge completed with RegNet output-tracing conflicts resolved by keeping the decorator-based return path; compileall and repository style checks passed; light …
huggingface#43989 defect merged merge unavailable Fix AutoVideoProcessor class lookup when torchvision is unavailable merge clean after resolving AutoVideoProcessor lookup conflict for current string-or-None mapping; compileall and style passed; light validation unavailable du…
huggingface#43967 defect merged merge unavailable Fix AttributeError in run_classification.py when detecting multi-label data merge clean; compileall and style passed; light validation unavailable due timeout as in baseline
huggingface#43961 defect already_present none not_run Replace mutable default arguments with None intended mutable-default fix is already present on the cumulative branch: Idefics uses immutable tuple defaults and debug_utils no longer has the mutable defau…
huggingface#43915 feature merged merge unavailable add PaddleOCR-VL conversion merge clean; fixed mechanical ruff issues in the added conversion script and amended merge commit; compileall/style passed; light validation unavailable due ti…
huggingface#43911 defect merged merge unavailable add Llama to mapping names in tokenization_auto.py merge clean; compileall and style passed; light validation unavailable due timeout as in baseline
huggingface#43875 defect merged merge unavailable Improve handling of QuantizedLayer.reset merge clean; compileall and style passed; light validation unavailable due timeout as in baseline
huggingface#43863 feature merged merge unavailable [whisper] allow to pass text/audio specific kwargs merge clean; compileall and style passed; light validation unavailable due timeout as in baseline
huggingface#43842 defect already_present none not_run fix(cli): Fix TypeAdapter NameError when pydantic is not installed serve CLI has since been refactored into lazy serving handlers gated by is_serve_available, and no runtime TypeAdapter annotation remains in the current serve.…
huggingface#43838 feature merged merge unavailable Qwen3 ASR and Forced Aligner merge completed with small auto-mapping/check_repo conflicts resolved by retaining both current and Qwen3 ASR entries; compileall/style passed; light validatio…
huggingface#43836 defect already_present none not_run fix: wrapped TypeAdpater in string literals (for now) current serve CLI no longer contains runtime TypeAdapter annotations after the serving refactor; this duplicate pydantic-optional import fix is already superse…
huggingface#43833 defect merged merge unavailable fix: ensure dtype consistency in grouped_mm under autocast merge clean; compileall and style passed; light validation unavailable due timeout as in baseline
huggingface#43826 defect already_present none not_run fix: error message of pipeline current PipelineRegistry.check_task has already removed the stale translation_XX_to_YY fallback and now reports only supported tasks, superseding the PR fix
huggingface#43823 feature merged merge unavailable Add facebook/MobileLLM-125M merge clean; amended mechanical ruff/import/format fixes in added MobileLLM files; compileall and style passed; light validation unavailable due timeout as in …
huggingface#43816 defect already_present none not_run fix: add id and resume parameters to SwanLab integration current SwanLab integration already supports SWANLAB_RUN_ID and SWANLAB_RESUME, with an additional default resume=allow when resume_from_checkpoint is used; no…
huggingface#43779 feature merged merge unavailable SwanLab: Add support for id and resume arguments in SwanLabCallback merge clean; compileall and style passed; light validation unavailable due timeout as in baseline
huggingface#43775 defect merged merge unavailable fix(moe): normalize auxiliary loss by top_k for correct load balancing merge clean; compileall and repository style checks passed; baseline light validation timed out after 300s
huggingface#43747 defect merged merge unavailable Remove CompressedLinear support for compressed-tensors > 0.13 merge completed with manual conflict resolution preserving current FP8-kernel guard; compileall and repository style checks passed after narrow ruff format; ba…
huggingface#43663 feature applied patch unavailable Add _get_signature_columns method to allow custom trainers to override column filtering applied overridable _get_signature_columns hook; compileall and repository style checks passed; baseline light validation timed out after 300s
huggingface#43654 defect merged merge unavailable fix(tokenizer): Avert special token property overwrites in batch add_tokens calls merge completed with additive test conflict resolved by keeping both tests; compileall and repository style checks passed; baseline light validation timed out …
huggingface#43651 feature merged merge unavailable Add _loss_is_scaled_for_ga to allow custom trainers to control gradient accumulation loss scaling merge clean; compileall and repository style checks passed; baseline light validation timed out after 300s
huggingface#43636 feature applied patch unavailable Add _metrics dict to Trainer for custom metric logging applied custom Trainer _metrics collection; compileall and repository style checks passed; baseline light validation timed out after 300s
huggingface#43613 feature merged merge unavailable Add Promptable Visual Segmentation pipeline merge completed with generated pipeline registration/docs conflicts resolved to current generated style; compileall and repository style checks passed; baselin…
huggingface#43612 feature merged merge unavailable Add Promptable Concept Segmentation pipeline merge completed with adjacent promptable visual/concept segmentation generated registrations kept together; compileall and repository style checks passed; base…
huggingface#43549 defect merged merge unavailable [kernels] exception handling for fa kernels merge clean; compileall and repository style checks passed; baseline light validation timed out after 300s
huggingface#43543 defect merged merge unavailable Fix fp16 underflow in MoE load balancing loss by enforcing fp32 softmax merge clean; compileall and repository style checks passed; baseline light validation timed out after 300s
huggingface#43542 defect applied patch unavailable fix: output router capture wrong router logits in qwen moe models applied router_logits preservation while keeping fp32 softmax routing weights; compileall and repository style checks passed; baseline light validation timed o…
huggingface#43506 feature merged merge unavailable Add RishAI model with full transformers integration merge clean; narrow ruff format applied to new RishAI files and amended into merge commit; compileall and repository style checks passed; baseline light valida…
huggingface#43498 defect merged merge unavailable fix/backward compatibility for tie_weights merge clean; narrowed deprecated alias to self.tie_weights and applied ruff fix/format in changed file; compileall and repository style checks passed; baseline…
huggingface#43492 feature merged merge unavailable Perception Encoder follow up PR merge completed with current strict config style retained and PE audio/video conversion mapping added; compileall and repository style checks passed; baseline …
huggingface#43484 feature merged merge unavailable Optimize Ernie 4.5 VL timestamp rendering with cached overlays merge completed with timestamp overlay cache adapted to current Ernie 4.5 VL Moe class naming; compileall and repository style checks passed; baseline light va…
huggingface#43469 feature merged merge unavailable argparser: Allow optional bool flags without values merge clean; compileall and repository style checks passed; baseline light validation unavailable because tests_fetcher checkout is blocked by pre-existing unt…
huggingface#43466 defect merged merge unavailable Fix mask loss to ignore padding areas in object detection merge clean; compileall and repository style checks passed; baseline light validation unavailable because tests_fetcher checkout is blocked by pre-existing unt…
huggingface#43451 feature merged merge unavailable Add Molmo2 merge completed with conversion mapping conflict resolved and auto mappings sorted/amended; compileall and repository style checks passed; light validation una…
huggingface#43395 defect merged merge unavailable Fix label truncation for per-sample nested structures in Trainer merge completed with trainer import conflict resolved; compileall and repository style checks passed; baseline light validation unavailable because tests_fetch…
huggingface#43382 feature merged merge unavailable Allow Path type in transformers.image_utils.load_image function merge completed with simple image_utils conflict resolved to retain current imports while adding Path input typing; compileall and repository style checks pass…
huggingface#43378 defect merged merge unavailable feat(models): Make MimiModel encoding padding-aware to ensure batch-to-individual consistency merge clean; compileall and repository style checks passed; baseline light validation unavailable because tests_fetcher checkout is blocked by pre-existing unt…
huggingface#43363 feature merged merge unavailable [Improvement] Update DistributedLengthGroupedSampler to allow customizing length function merge completed with sampler conflict resolved by retaining current optimized default length computation while adding custom length_func/mega_batch_mult suppor…
huggingface#43291 defect merged merge unavailable Fix whisper tests merge clean; compileall and repository style checks passed; baseline light validation unavailable because tests_fetcher checkout is blocked by pre-existing unt…
huggingface#43270 defect merged merge unavailable fix _retrieve_segment timestamps offset bug merge clean; fixed PR-local trailing whitespace/formatting and amended merge; compileall and repository style checks passed; baseline light validation unavaila…
huggingface#43254 defect merged merge unavailable Add supported kwargs to fixed_cross_entropy merge clean; compileall and repository style checks passed; baseline light validation unavailable because tests_fetcher checkout is blocked by pre-existing unt…
huggingface#43251 defect already_present none not_run Fix(43240): pass kwargs to nn.functional.cross_entropy same fixed_cross_entropy weight and label_smoothing support already merged from PR 43254; PR head is older and would also remove current loss mappings
huggingface#43212 defect merged merge unavailable Add regression test for offline tokenizer loading (fixes huggingface#43200) merge clean; compileall and repository style checks passed; baseline light validation unavailable because tests_fetcher checkout is blocked by pre-existing unt…
huggingface#43151 defect merged merge unavailable Make TF32 tests hardware-aware for PyTorch 2.9+ merge clean; compileall and repository style checks passed; baseline light validation unavailable because tests_fetcher checkout is blocked by pre-existing unt…
huggingface#43133 defect merged merge unavailable Fix flaky SAM-HQ integration tests by adding set_seed merge completed with SAM-HQ conflicts resolved by applying the positional-embedding sharing fix to current modular/generated files; compileall and repository s…
huggingface#43096 defect applied cherry-pick unavailable Fix save_pretrained for quantized models with custom serialization applied the defect-fix commit only with the save_original_format guard resolved against current PEFT handling; direct PR-head merge was avoided because the hea…
huggingface#43094 feature merged merge unavailable Avoid inline .item() sync in decoder start token check merge clean; compileall and repository style checks passed; baseline light validation unavailable because tests_fetcher checkout is blocked by pre-existing unt…
huggingface#43088 feature merged merge unavailable Skip attention_mask.all() GPU-CPU sync during generation merge completed with import conflict resolved against current generation utilities; compileall and repository style checks passed; baseline light validation un…
huggingface#43085 feature merged merge unavailable Add async_stopping_criteria flag to reduce GPU-CPU syncs during generation merge completed with generation config/utils conflicts resolved and PR-local unused cur_len lint fixed by amendment; compileall and repository style checks pas…
huggingface#43056 feature merged merge unavailable Perf: enable pin_memory in DataLoader for CLM no_trainer example merge clean; compileall and repository style checks passed; baseline light validation unavailable because tests_fetcher checkout is blocked by pre-existing unt…
huggingface#43044 feature merged merge unavailable [SAM3] Enable single-scale input support in Mask Decoder merge completed with test import conflict resolved against current SAM3 test imports and PR-local Union annotation/formatting fixed by amendment; compileall an…
huggingface#43028 defect applied patch unavailable Fix default interpolation to BICUBIC for ViT, EfficientNet, PVT ported the remaining ViTImageProcessor default interpolation change to BICUBIC; compileall and repository style checks passed; light validation unavailable aft…
huggingface#43015 defect merged merge unavailable FIX: TF32 warning (huggingface#43012) merge clean; compileall and repository style checks passed; baseline light validation unavailable because tests_fetcher checkout is blocked by pre-existing unt…
huggingface#42979 defect merged merge unavailable Fix dtype mismatch in in modeling_llava_next merge clean; compileall and repository style checks passed; baseline light validation unavailable because tests_fetcher checkout is blocked by pre-existing unt…
huggingface#42942 defect applied patch unavailable Fix result retrieval starvation and terminate request-scoped iteration on completion ported starvation fix by deferring mismatched request results until a matching result is found and stopping request iteration on completion/cancellation; compi…
huggingface#42916 defect already_present none unavailable Fix: Apply clean_up_tokenization_spaces in TokenizersBackend._decode current TokenizersBackend._decode already honors clean_up_tokenization_spaces with the branch-specific BPE safeguard, which supersedes the PR implementation; n…
huggingface#42881 defect merged merge unavailable [GGUF] Add attn_logit_softcapping to Gemma2/Gemma3 config mapping merge clean; compileall and repository checkers passed, but light validation unavailable due pre-existing validation/test_fetcher failures (baseline KeyError; …
huggingface#42865 defect merged merge unavailable Raise explicit error when FP8 is requested via from_config merged with localized resolution in modeling_utils.init; compileall and repository checkers passed, but light validation unavailable due pre-existing valid…
huggingface#42793 defect merged merge unavailable [Quantization] Fixing last issues for a green 2026 CI hopefully merge clean; compileall and repository checkers passed, but light validation unavailable due pre-existing validation/test_fetcher checkout failures from untrac…
huggingface#42774 defect already_present none not_run Fix: add None check for extractors in video_processor_class_from_name current branch already contains the None guard in video_processor_class_from_name (if extractors is not None and class_name in extractors), so PR fix needed no…
huggingface#42765 feature merged merge unavailable Add distributed training CI merged with localized CI marker/test-fetcher resolutions and formatter fixes; compileall and repository checkers passed, but light validation unavailable due p…
huggingface#42717 defect merged merge unavailable image_transforms: fix tensor annotations merge clean; compileall and repository checkers passed, but light validation unavailable due pre-existing tests_fetcher checkout blocker from untracked generat…
huggingface#42598 defect merged merge unavailable [pipeline] fix unwanted import failure merge clean; compileall and repository checkers passed, but light validation unavailable due pre-existing tests_fetcher checkout blocker from untracked generat…
huggingface#42542 defect already_present none not_run handle get_input_embeddings() on models like gemma3 gracefully current branch already contains the intended guard for get_input_embeddings returning None or an object without register_forward_hook in enable_input_require_g…
huggingface#42496 feature already_present none not_run feat: allow CP with trainer current trainer already contains context-parallel training support including cp_size helpers, shift_labels preparation, padding to cp_size*2, and accelerator.m…
huggingface#42493 defect merged merge unavailable fix: remove trailing os sep in local pretrained model path merged with localized test conflict resolution preserving existing dynamic-module tests and adding trailing-separator coverage; compileall and repository check…
huggingface#42446 defect merged merge unavailable Fix DataParallel dtype access in smolvlm merge clean; compileall and repository checkers passed, but light validation unavailable due pre-existing tests_fetcher checkout blocker from untracked generat…
huggingface#42432 feature merged merge unavailable Add VideoToTextPipeline with smart frame sampling and system prompts merged with localized pipeline registry conflict resolution and formatter-only amend; compileall and repository checkers passed, but light validation unavailab…
huggingface#42424 defect merged merge unavailable [WIP] attempt to fix ooms in tests merged with localized Qwen cache_position conflict resolution; compileall and repository checkers passed, but light validation unavailable due pre-existing tes…
huggingface#42403 feature already_present none not_run Sam3 release on v4.57.3 current cumulative branch already contains SAM3, SAM3 video, tracker, tracker-video, auto mappings, docs, and tests; PR targets v4.57-release and direct merge …
huggingface#42311 defect applied patch unavailable Fix: Guard against None num_query_tokens in Blip2Processor (to avoid TypeError) processor guard applied; compileall and style passed; light validation unavailable (tests_fetcher failed, baseline was also unavailable)
huggingface#42310 feature merged merge unavailable [WIP] Add moondream3 model merged cleanly; narrow ruff fixes amended; compileall and style passed; light validation unavailable due baseline untracked-file checkout issue
huggingface#42256 feature merged merge unavailable Integrate Core AutoAWQ Inference Components into Transformers empty merge commit created; compileall and style passed; light validation unavailable due baseline untracked-file checkout issue
huggingface#42228 defect merged merge unavailable Support .to(device) or Device Aware Handling for Segmentation Labels in EOMTImageProcessor huggingface#42205 merged cleanly; compileall and style passed; light validation unavailable due baseline untracked-file checkout issue
huggingface#42134 feature merged merge unavailable Add AutoMergeAdapters utility for merging multiple LoRA adapters with… merged cleanly; narrow ruff import-order fix amended; compileall and style passed; light validation unavailable due baseline untracked-file checkout issue
huggingface#42133 defect merged merge unavailable Fix qwen moe Load balancing loss calculation outside training merged cleanly; compileall and style passed; light validation unavailable due baseline untracked-file checkout issue
huggingface#42131 feature already_present none unavailable Mistral Tokenizer Converter Script - Initialization with Pattern Argument PR intent already present: current MistralConverter call already passes the Tekken tokenizer pattern while using the newer PreTrainedTokenizerFast initializati…

@evalstate
Copy link
Copy Markdown
Owner Author

Feature + defect flow status table (part 2/4)

| huggingface#42098 | defect | merged | merge | unavailable | Fix mel length computation in Qwen2-Audio | merged with a local docstring conflict resolution; mel-length fix applied; compileall and style passed; light validation unavailable due baseline untracked-fil… |
| huggingface#42051 | defect | merged | merge | unavailable | Fix model_input_names singleton issue causing shared state | merge clean; compileall and style passed; light validation unavailable due baseline untracked-file checkout issue |
| huggingface#41973 | defect | applied | patch | unavailable | Fix import error with huggingface_hub v1.0.0+ | direct merge conflicted with current lazy hub-import refactor; applied the two remaining import-location fixes only; compileall and style passed; light validat… |
| huggingface#41928 | defect | applied | patch | unavailable | fix: add clear error message when mistral-common is missing for AutoTokenizer loading Voxtral | direct merge conflicted with current AutoTokenizer backend refactor; applied the Voxtral missing-mistral-common error check to the current code; compileall and… |
| huggingface#41904 | defect | applied | patch | unavailable | Fix inaccurate eval and train loss computation with variable batch sizes | direct merge conflicted in trainer.py; applied variable-batch loss weighting to current examples and Trainer loop; compileall and style passed; light validatio… |
| huggingface#41901 | feature | merged | merge | unavailable | [executorch] Update pytree registration for DynamicCache | merge clean; compileall and style passed; light validation unavailable because tests_fetcher cannot run with existing untracked generated files and baseline li… |
| huggingface#41895 | feature | merged | merge | unavailable | Add Telugu Sentiment Classification Example using DistilBERT | merge clean; compileall and style passed; light validation unavailable because tests_fetcher cannot checkout merge parent over existing untracked generated fil… |
| huggingface#41879 | defect | already_present | none | not_run | Fix/processor multiple tokenizers | current ProcessorMixin already excludes all tokenizer attributes from to_dict, saves non-primary tokenizers in attribute subfolders, and loads additional token… |
| huggingface#41855 | defect | merged | merge | unavailable | Add Mistral tokenizer missing methods | merge had a docstring conflict resolved in current tokenizer wrapper; formatter-only change amended; compileall and style passed; light validation unavailable … |
| huggingface#41851 | defect | already_present | none | not_run | Fix deepcopy in ProcessorMixin.to_dict for GemmaTokenizerFast | same ProcessorMixin multi-tokenizer/deepcopy fix as PR 41879 is already covered by the current generalized implementation; no code change needed |
| huggingface#41844 | defect | applied | patch | unavailable | Fix FSDPv2 checkpoint saving on TPU by using recursive unwrap | direct PR-head merge conflicted in trainer.py after TPU checkpoint saving moved to integrations/tpu.py; applied recursive FSDPv2 unwrap to the current helper; … |
| huggingface#41827 | defect | merged | merge | unavailable | [Flash Attention] Disable packed sequences with pos ids only during torch compile | merged with local conflict resolution in current flash attention utilities; compileall and style passed; light validation unavailable/timed out after creating … |
| huggingface#41798 | feature | merged | merge | unavailable | p-less Sampling: A Robust Hyperparameter-Free Approach for LLM Decoding | merged with a local GenerationConfig conflict resolution preserving current None defaults and warning gating; compileall and style passed; light validation una… |
| huggingface#41797 | feature | already_present | none | not_run | Add deepseek ocr | current cumulative branch already contains DeepSeek-OCR model/docs/registration; direct merge only exposed conflicts in generated auto mappings and repo-check … |
| huggingface#41776 | feature | merged | merge | unavailable | Add safety checking infrastructure for text generation | merged with a local docs conflict resolution; narrow ruff fixes on PR files amended; compileall and style passed; light validation unavailable/timed out |
| huggingface#41734 | defect | merged | merge | unavailable | Fix CUDA errors in sharded generation with Qwen3 | merged with local generation-utils conflict resolution and narrow ruff fix; compileall and style passed; light validation unavailable/timed out |
| huggingface#41724 | defect | already_present | none | not_run | Fix confusing warning in EncoderDecoderModel when creating decoder_input_ids from labels | intended warning split is already present in current cumulative branch; attempted PR merge only conflicted on an unrelated decoder_attention_mask expression |
| huggingface#41718 | defect | merged | merge | unavailable | AutoTokenizer: clear ImportError when loading Voxtral without mistral-common + unit test | merged with local tokenization_auto conflict resolution preserving v5 remote-code guard; narrow ruff fix amended; compileall and style passed; light validation… |
| huggingface#41701 | defect | merged | merge | unavailable | Fix qwen3_vl mix precision dtype | merged cleanly; compileall and style passed after removing transient untracked files left by light validation; light validation unavailable from tests_fetcher … |
| huggingface#41698 | defect | applied | cherry-pick | unavailable | Fix tokenizer check script: safe dataset access, default checkpoints, and tested in dry-run mode | cherry-picked PR fix commit with narrow conflict resolution and ruff amend; compileall and style passed; light validation unavailable from tests_fetcher as in … |
| huggingface#41687 | defect | merged | merge | unavailable | fix(data): Handle integer labels in DataCollatorWithFlattening | merged with local data_collator/test conflicts resolved for current pack_sequence_labels implementation; narrow ruff format amended; compileall and style passe… |
| huggingface#41631 | defect | already_present | none | not_run | Incorrect access of dataset field fixed | same dataset field access fix is already included via PR 41698 applied earlier in this cumulative branch |
| huggingface#41606 | defect | already_present | none | not_run | fix(processing): Filter kwargs in ProcessorMixin call to prevent Type… | ProcessorMixin has since been refactored to split merged kwargs by modality and pass only images/videos/audio/text kwargs to each subprocessor, so the PR fix i… |
| huggingface#41594 | feature | merged | merge | unavailable | Add beginner-friendly sentiment analysis example | merged cleanly; ruff-only fixes on new example files amended; compileall and style passed; light validation unavailable from tests_fetcher timeout as in baseli… |
| huggingface#41593 | feature | merged | merge | unavailable | examples: add multi-label text classification (BCEWithLogitsLoss, met… | merged with README conflict resolved by preserving existing text-classification README plus adding multi-label section; ruff-only fixes amended; compileall and… |
| huggingface#41592 | defect | already_present | none | not_run | Improve AutoTokenizer error message for Voxtral models missing mistral-common | current AutoTokenizer already raises a clear ImportError for model_type == 'voxtral' when mistral-common is unavailable, so the intended error-message fix is p… |
| huggingface#41561 | feature | merged | merge | unavailable | Optimize Mamba2 memory usage by replacing broadcast with einsum | merged cleanly; compileall and style passed after removing validation-created untracked files; light validation unavailable from tests_fetcher timeout as in ba… |
| huggingface#41528 | feature | already_present | none | not_run | Add position encoding interpolation to DeiT | current DeiT model already has interpolate_pos_encoding support, including dynamic-shape-safe position interpolation and slow inference coverage; the stale PR … |
| huggingface#41524 | feature | applied | patch | unavailable | Add max_eval_batches argument to TrainingArguments | applied max_eval_batches to current TrainingArguments and evaluation_loop; compileall and style passed after removing stale untracked validation artifacts; lig… |
| huggingface#41523 | feature | already_present | none | not_run | Add test coverage for ConvNextImageProcessorFast | current v5 image-processing test mixin already discovers ConvNext pil/torchvision processors from IMAGE_PROCESSOR_MAPPING_NAMES, so fast/backend coverage is pr… |
| huggingface#41522 | defect | applied | patch | unavailable | Fix _init_weights to safely skip int8 quantized weights | applied a non-floating weight guard before Qwen2.5-VL initialization while preserving current rotary initialization; compileall and style passed; light validat… |
| huggingface#41521 | defect | merged | merge | unavailable | Fix forced_bos_token_id not set in generation_config | merged cleanly; compileall and style passed; light validation unavailable from tests_fetcher timeout as in baseline |
| huggingface#41490 | defect | already_present | none | not_run | Fix _init_weights to safely skip int8 tensors in Qwen2_5_VL model | same intended Qwen2.5-VL non-floating/int8 initialization fix was already applied from overlapping PR huggingface#41522 in this cumulative branch |
| huggingface#45702 | defect | merged | merge | unavailable | Reorder decorators for autodoc and dataclass | merged with a local conflict resolution in qwen3_5 preserving existing MTP code while applying the decorator-order fix; compileall and style passed; light vali… |
| huggingface#45699 | feature | already_present | none | not_run | Add FP8 kernel acceleration for compressed-tensors quantized models | PR head changes are already reachable in the cumulative branch, so no additional code change was needed |
| huggingface#45694 | defect | already_present | none | not_run | Fix train_batch_size and eval_batch_size to respect split_batches config | PR head changes are already reachable in the cumulative branch, so no additional code change was needed |
| huggingface#45690 | feature | already_present | none | not_run | [serve] Support for reasoning | PR head changes are already reachable in the cumulative branch, so no additional code change was needed |
| huggingface#45687 | defect | already_present | none | not_run | fix: Made histc_input robust for broader hardware | PR head changes are already reachable in the cumulative branch, so no additional code change was needed |
| huggingface#45683 | defect | already_present | none | not_run | Exclude audio modules from conversion process | PR head changes are already reachable in the cumulative branch, so no additional code change was needed |
| huggingface#45682 | defect | already_present | none | not_run | FIX Restore LoRA hotswapping functionality | PR head changes are already reachable in the cumulative branch, so no additional code change was needed |
| huggingface#45681 | defect | already_present | none | not_run | Restore TokenizersBackend override for DeepSeek V3/R1 tokenizer dispatch | current cumulative branch already honors TokenizersBackend/PythonBackend overrides before trusting tokenizer_config_class and includes DeepSeek regression cove… |
| huggingface#45678 | defect | already_present | none | not_run | Fix shared config mutation issue in flash_attn_from_config | PR head changes are already reachable in the cumulative branch, so no additional code change was needed |
| huggingface#41458 | feature | merged | merge | unavailable | Adding ScatterMoE kernel support for Granite models. | merge clean; compileall and style passed; validation unavailable because light validation tests_fetcher timed out as in baseline |
| huggingface#41441 | feature | merged | merge | unavailable | Enhance the handling of Union types in HfArgumentParser | merge clean; compileall and style passed; validation unavailable because light validation tests_fetcher timed out as in baseline |
| huggingface#41356 | feature | already_present | none | not_run | Add DEIMv2 model, image processor, and basic tests | DEIMv2 model/docs/tests are already present on the cumulative branch; direct merge conflicts add/add with the existing generated/modular implementation, so no … |
| huggingface#41349 | feature | merged | merge | unavailable | Create (3d_parrallel_v2.py) - Add 3D parallelism training example script | merge clean; amended ruff-only fixes in changed example file; compileall and style passed; validation unavailable because light validation tests_fetcher timed … |
| huggingface#41319 | defect | already_present | none | not_run | Use torch._check instead of a test in Gemma3Model | current Gemma3 image-token check already uses torch_compilable_check instead of a data-dependent tensor indexing assertion, preserving torch.export compatibili… |
| huggingface#41313 | defect | merged | merge | unavailable | Jitter noise PR | merged with a local conflict resolution preserving current classifier dtype cast while applying jitter noise only to routing_states; compileall and style passe… |
| huggingface#41304 | defect | merged | merge | unavailable | Fix equality-vs-assignment bug in GptqHfQuantizer.update_device_map | merged with a local conflict resolution applying the device_map assignment fix to the current GPTQ quantizer; compileall and style passed; light validation una… |
| huggingface#41291 | feature | already_present | none | not_run | Add-Deimv2 | DEIMv2 integration is already present on the cumulative branch with newer modular/generated files; direct merge only conflicts add/add with existing DEIMv2 doc… |
| huggingface#41239 | defect | applied | patch | unavailable | Add num_hidden_layers to t5gemma's top level config | applied the T5Gemma top-level num_hidden_layers fix to current modular/generated config after direct merge conflicted with the newer strict dataclass __post_in… |
| huggingface#41224 | feature | merged | merge | unavailable | Add DINOv3ViTForImageClassification support | merged with a small import conflict resolution in DINOv3ViT modeling/modular files; compileall and style passed; light validation unavailable from tests_fetche… |
| huggingface#41169 | defect | merged | merge | unavailable | Fix TorchDynamo crash in StaticCache by validating offloading and offload_only_non_sliding arguments | merged cleanly; compileall and style passed; light validation unavailable from tests_fetcher timeout as in baseline |
| huggingface#41144 | feature | merged | merge | unavailable | Support automatic conversion from zero checkpoint to universal checkpoint. | merged cleanly; amended narrow current-code fixes for self.args and bool annotation; compileall and style passed after removing untracked stale validation-bloc… |
| huggingface#41132 | defect | applied | patch | unavailable | fix(SpeechT5Config): missing annotation on inputs_to_logits_ratio property | direct merge conflicted with old SpeechT5Config init layout; applied the one-line @Property fix to current config; compileall and style passed; light validatio… |
| huggingface#41121 | defect | already_present | none | not_run | fix: resolve the unexpected video frame drop issue of the InternVL model with multiple video inputs | current InternVL processor already uses per-video patch boundary indices and has a frames-binding test, so the PR's multi-video frame-drop fix is present; dire… |
| huggingface#41116 | feature | already_present | none | not_run | Add MiniCPM3 | MiniCPM3 support is already present on the cumulative branch from the newer overlapping PR huggingface#45613, including config/model/modular files and auto mappings; no a… |
| huggingface#41105 | defect | applied | patch | unavailable | Fix is_torch_neuroncore_available | applied the narrow NeuronCore check_device fix to current import_utils because PR head carries older import_utils history; compileall and style passed; light v… |
| huggingface#41077 | defect | already_present | none | not_run | Fix: add num_hidden_layers property to T5GemmaConfig and add test for use_cache | same T5Gemma num_hidden_layers fix was already applied from overlapping PR huggingface#41239 to both modular and generated config; no code change needed |
| huggingface#41075 | defect | merged | merge | unavailable | Fix Qwen3 deterministic generation when do_sample=False and num_beams=1 for Greedy Decoding | merged cleanly; compileall and style passed; light validation unavailable from tests_fetcher timeout as in baseline |
| huggingface#41041 | feature | merged | merge | unavailable | [WIP] Add YuE model | merged cleanly; amended narrow ruff fixes in new YuE files; compileall and style passed; light validation unavailable from tests_fetcher timeout as in baseline |
| huggingface#41033 | feature | merged | merge | unavailable | feat: make audio feature extractors torch.export-able | merged with a trivial tests/test_executorch.py import conflict resolved; compileall and style passed; light validation unavailable from tests_fetcher timeout a… |
| huggingface#40976 | feature | merged | merge | unavailable | Better defaults for assisted generation | merged with current candidate-generator min/max length handling kept; amended narrow UP045 ruff fix; compileall and style passed; light validation unavailable … |
| huggingface#40954 | defect | already_present | none | not_run | Fix Issue huggingface#40913: Respect user-provided chat_template parameter in processor creation | current ProcessorMixin already preserves user-provided chat_template because processor_dict is updated with kwargs after loading chat_templates; PR test file i… |
| huggingface#40908 | defect | merged | merge | unavailable | Fix incompatible with | merged cleanly; compileall and style passed; light validation unavailable as in baseline |
| huggingface#40898 | feature | merged | merge | unavailable | Adding [T5/MT5/UMT5]EncoderForSequenceClassification | merged with a trivial T5 test import conflict resolved; amended narrow py313 typing fixes in changed model files; compileall and changed-file ruff passed; full… |
| huggingface#40861 | feature | merged | merge | unavailable | Support n_groups>1 for mamba2 | merged with a narrow Mamba2 init conflict resolution preserving current lazy weight initialization; compileall and changed-file ruff passed; full style and lig… |
| huggingface#40857 | defect | already_present | none | not_run | Token | current Trainer already computes train_tokens_per_second from current-session tokens via _initial_num_input_tokens_seen; direct merge conflicts against the ref… |
| huggingface#40840 | feature | merged | merge | unavailable | feat: add qwen2 pruning support | merged with a small Qwen2 conflict resolved against current layer_type handling and single-quote style normalized; compileall and changed-file ruff passed; ful… |
| huggingface#40790 | defect | applied | patch | unavailable | Handle loading non-existent checkpoints or corrupted checkpoints. | applied minimal missing/corrupt checkpoint handling in Trainer and get_last_checkpoint; compileall and changed-file ruff passed; full style/light validation un… |
| huggingface#40783 | defect | merged | merge | unavailable | Fix None quantization_config equivalence with omitted param in AutoModel.from_pretrained | merged cleanly; compileall and changed-file ruff passed; full style/light validation unavailable as in baseline |
| huggingface#40759 | feature | already_present | none | not_run | feat: add qwen3 pruning support | Qwen3 structured-pruning changes are already contained in the earlier merged PR huggingface#40840; merge reported already up to date |
| huggingface#40756 | feature | merged | merge | unavailable | [WIP] Add Canary | merged cleanly; compileall and changed-file ruff passed; full style/light validation unavailable as in baseline |
| huggingface#40755 | feature | merged | merge | unavailable | [TimesFM] Add support for forecasting with covariates | merged cleanly; compileall and changed Python-file ruff passed; full style/light validation unavailable as in baseline |
| huggingface#40740 | defect | merged | merge | unavailable | Configure assistant model's generation_config with user parameters | merge clean; validation unavailable due pre-existing ruff failures and untracked files blocking tests_fetcher checkout |
| huggingface#40695 | defect | already_present | none | not_run | remove vision2seq vs image-text-to-text | intended fix already present: current modeling_auto.py has no MODEL_FOR_VISION_2_SEQ mapping containing Qwen vision2seq entries |
| huggingface#40587 | feature | merged | merge | unavailable | feat(utils): add vision utils for embedding images and getting the hidden size | merge clean; validation unavailable because baseline already fails ruff on pre-existing untracked legacy/restored files and light validation tests_fetcher time… |
| huggingface#40563 | defect | already_present | none | not_run | fix to get output of intermediate output of dinov3 for more use case | intended DINOv3 intermediate hidden-state support is already present: current DINOv3ViTModel delegates to DINOv3ViTEncoder with capture_outputs and returns out… |
| huggingface#40520 | feature | merged | merge | unavailable | [generate] add faster stop_strings stopping criteria | merge completed with a small test conflict resolved; validation unavailable because baseline already fails ruff on pre-existing untracked/restored files and li… |
| huggingface#40515 | feature | merged | merge | unavailable | Add Context-Aware Tokenizer Selection Utility Based on Corpus Analysis | merge clean; validation unavailable because baseline already fails ruff on pre-existing untracked/restored files and light validation tests_fetcher times out; … |
| huggingface#40492 | defect | merged | merge | unavailable | avoid divid zero errors. | merge completed; masking_utils None guard was already present in current code, and image_processing_utils/debug_utils changes were applied; validation unavaila… |
| huggingface#40438 | defect | merged | merge | unavailable | Resolve automatic label name detection when single label provided | merge completed with Trainer conflict resolved by applying the label_names=[label] reset to the current staged Trainer initialization; compileall and PR-local … |
| huggingface#40392 | defect | applied | cherry-pick | unavailable | Remove debug print statement from ShieldGemma2 conversion script | cherry-pick applied the one-line debug-print removal without dragging older branch history; compileall passed, but full validation unavailable due pre-existing… |
| huggingface#40388 | feature | applied | cherry-pick | unavailable | Add fromjson filter to Jinja2 chat templates | cherry-pick applied the fromjson filter and tests without pulling older branch history; compileall and PR-local ruff passed, full validation unavailable due pr… |
| huggingface#40385 | defect | applied | cherry-pick | unavailable | Fix typo: 'seperate' -> 'separate' in mm_grounding_dino conversion sc… | cherry-pick applied the MM Grounding DINO conversion key correction; compileall and PR-local ruff passed, full validation unavailable due pre-existing baseline… |
| huggingface#40358 | defect | applied | patch | unavailable | Fix MXFP4 mlp_forward to handle 2D and 3D hidden_states shapes for multi-turn chat | applied the MXFP4 mlp_forward dimensionality fix only; direct merge/cherry-pick would include older-history test/CI churn, while current integration code has d… |
| huggingface#40244 | defect | merged | merge | unavailable | add-loftr-keypoints-to-map | merge required a trivial auto-mapping conflict resolution; validation unavailable due baseline ruff failures and tests_fetcher backend error |
| huggingface#40221 | defect | applied | patch | unavailable | FIX: enable load_best_model_at_end within SaveStrategy.BEST and initialize metric_for_best_model as loss when SaveStrategy.BEST | direct merge conflicted in trainer files after test/module refactors; ported remaining metric_for_best_model default change, while trainer SaveStrategy.BEST ch… |
| huggingface#40208 | defect | applied | patch | unavailable | Save only model sharded sd | direct merge conflicted in trainer after refactors; ported the output_dir creation, FSDP sharded model save path, and removal of the incompatible validation; v… |
| huggingface#40148 | defect | merged | merge | unavailable | Update utils.py: fix nan | merge clean; compileall and PR-local ruff passed, but full validation unavailable due pre-existing baseline ruff failures and light-validation tests_fetcher ch… |
| huggingface#40123 | defect | already_present | none | unavailable | Lazily import torchao Int4WeightOnlyConfig to avoid side effects | intended torchao eager-import side effect is already absent in current code after quantization/loading refactors; direct merge conflicts in obsolete modeling_u… |
| huggingface#40115 | feature | already_present | none | unavailable | [layer_types] update layer_types with conv | conv is already allowed in current ALLOWED_LAYER_TYPES and Lfm2Config already derives conv/full_attention layer_types in the refactored dataclass config layout… |
| huggingface#40114 | defect | applied | patch | unavailable | Fix torch.export compatibility for Mixtral MoE models | ported the Mixtral inference-time static expert loop to current optimized modular/generated MixtralExperts; compileall and PR-local ruff passed, but full valid… |
| huggingface#40092 | feature | merged | merge | unavailable | Optimize LlamaAttention by fusing QKV projections | merge clean; amended a PR-local blank-line whitespace fix; compileall and PR-local ruff passed, but full validation unavailable due baseline ruff failures and … |
| huggingface#40090 | defect | applied | patch | unavailable | Fix RuntimeError when loading quantized models with int8 weights (huggingface#39366) | ported non-floating weight/bias initialization guards to current modeling_utils; compileall and PR-local ruff passed, but full validation unavailable due basel… |
| huggingface#40065 | defect | applied | patch | unavailable | Delay float32 upcast in ForCausalLMLoss after filtering ignore_index | ported the ForCausalLMLoss ignore_index filtering before float32 upcast; compileall and PR-local ruff passed, but full validation unavailable due baseline ruff… |
| huggingface#40059 | defect | applied | patch | unavailable | Fix Inefficient GELU implementation in GPT2 | ported GPT-2 default activation from gelu_new to fused gelu; compileall and PR-local ruff passed, but full validation unavailable due baseline ruff failures an… |
| huggingface#40058 | feature | applied | patch | unavailable | GGUF Qwen2VL | ported Qwen2VL GGUF config/model-type/tokenizer converter support and coverage to current GGUF files; compileall and PR-local ruff passed, but full validation … |
| huggingface#40055 | feature | merged | merge | unavailable | Auto-log parallelism info to wandb.config using HF Accelerate | merge clean; amended PR-local W293/S110 lint fixes in wandb callback; compileall and PR-local ruff passed, but full validation unavailable due baseline ruff fa… |
| huggingface#40023 | feature | already_present | none | unavailable | Add support for SDPA for OWLViT and OWLv2 | OWLViT/OWLv2 already use ALL_ATTENTION_FUNCTIONS with _supports_sdpa and have SDPA parity tests in current branch; direct merge conflicted only with the newer … |
| huggingface#40022 | defect | applied | patch | unavailable | fix: resolve dropout type error in DogeDecoder | ported Doge MoE tuple unwrapping before dropout plus router_gate zero initialization to current modular/generated Doge; compileall and PR-local ruff passed, bu… |
| huggingface#39999 | defect | applied | patch | unavailable | allow TP to work in ND-parallel with fsdp cpu ram efficient loading | ported TP+FSDP meta device_map handling into current initialize_tensor_parallelism helper; compileall and PR-local ruff passed, but full validation unavailable… |
| huggingface#39997 | defect | merged | merge | unavailable | make sure position_ids are passed in for causal mask creation for gpt-oss | merge clean; compileall and PR-local ruff passed, but full validation unavailable due baseline ruff failures and tests_fetcher backend issue |
| huggingface#39941 | defect | merged | merge | unavailable | fixing image_utils.py todo | merge conflict was limited to imports in test_image_utils; resolved and merged validate_kwargs warning behavior; compileall and PR-local ruff passed, but full … |
| huggingface#39895 | feature | merged | merge | unavailable | Add Videoprism | merge clean; compileall and PR-local ruff passed, but full validation unavailable due pre-existing baseline ruff failures and tests_fetcher timeout/checkout is… |
| huggingface#39866 | defect | merged | merge | unavailable | make sure model.save_pretrained has the correct is_main_process | merge conflict was limited to current save_pretrained call shape; resolved by passing is_main_process plus save_safetensors; compileall and PR-local ruff passe… |
| huggingface#39794 | defect | applied | patch | unavailable | Fix ProphetNet forward to handle tuple encoder_outputs | ported ProphetNet tuple encoder_outputs conversion to current forward implementation; compileall and PR-local ruff passed, but full validation unavailable due … |
| huggingface#39793 | defect | merged | merge | unavailable | Fix DAC conversion script | merge clean; compileall and PR-local ruff passed, but full validation unavailable due baseline ruff failures and tests_fetcher checkout blockage on untracked c… |
| huggingface#39785 | defect | merged | merge | unavailable | fix mllama integration tests | merge conflict was limited to Mllama vision encoder hidden-state collection; resolved to include initial and final encoder states; compileall and PR-local ruff… |
| huggingface#39741 | defect | merged | merge | unavailable | Fix HfArgumentParser to filter out dict types from Union | merge clean; compile passed, but baseline validation already fails/unavailable due existing untracked/generated legacy files causing global ruff failures and t… |
| huggingface#39735 | defect | already_present | none | unavailable | handle multimodal models with tp_plan on the text_config | direct PR conflicted with the newer distribute_model signature, but the current branch already routes tp_plan through model.tp_plan populated from submodules/t… |
| huggingface#39698 | defect | applied | patch | unavailable | Fix exaone4 layer_types ZeroDivision/TypeError when sliding_window_pattern is None/"LLLG" | patch applied; compile and ruff on changed files passed, but full validation unavailable because baseline global ruff failures and tests_fetcher/import-structu… |
| huggingface#39697 | defect | merged | merge | unavailable | use untyped storage for dtensors due to deprecation | merge clean; validation unavailable because baseline has global ruff failures and tests_fetcher import-structure failure; compileall passed |
| huggingface#39690 | feature | applied | patch | unavailable | Allow custom hf_quantizer in from_pretrained | patch applied; compileall and narrow ruff on changed files passed, but full validation unavailable due baseline global ruff/test-fetcher failures |
| huggingface#39683 | defect | applied | patch | unavailable | Fix issue huggingface#39191 respect accelerate config to disable torch.dynamo compilation | patch applied; compileall and narrow ruff on changed file passed, but full validation unavailable due baseline global ruff/test-fetcher failures |
| huggingface#39675 | defect | already_present | none | unavailable | [BugFix]: Support dict and config file path for deepspeed | direct merge conflicted because TrainingArguments has been reorganized; current branch already types deepspeed as dict-or-path and passes it unchanged to HfTra… |
| huggingface#39674 | defect | applied | patch | unavailable | Fix loss scaling and token aggregation to use only data parallel group | patch applied; compileall and narrow ruff on changed file passed, but full validation unavailable due baseline global ruff/test-fetcher failures |
| huggingface#39625 | defect | already_present | none | unavailable | Fix: allow Union[str, dict, None] fields like deepspeed to be passed via CLI | direct merge conflicted in hf_argparser.py, but the current parser already filters dict out of Union types before argparse handling, so Union[str, dict, None] … |
| huggingface#39617 | defect | already_present | none | unavailable | Fix FSDP v1 bug: trainer incorrectly uses an unwrapped model | direct merge conflicted in Trainer setup, but the current branch already assigns the result of accelerator.prepare(self.model) to the local model and then stor… |
| huggingface#39599 | defect | applied | patch | unavailable | Fix: check TrainerState file exists before loading during resume | patch applied; compileall and narrow ruff on changed file passed, but full validation unavailable due baseline global ruff/test-fetcher failures |
| huggingface#39560 | defect | applied | patch | unavailable | fix load_model_end = true work when save_steps < eval_steps | patch applied; compileall and PR-local ruff passed, but full validation unavailable due baseline global ruff/test-fetcher failures |
| huggingface#39555 | feature | already_present | none | unavailable | [WIP] try to relax the tie_weights method | direct merge conflicted in modeling_utils.py, but current tie_weights has already been refactored to rely on expanded tied-weight mappings and no longer contai… |
| huggingface#39493 | defect | applied | patch | unavailable | [Voxtral] nit + pin correct mistral common version | ported Voxtral mistral-common dependency extras to current >=1.10.0 pin; compileall and PR-local ruff passed, global style/light validation matched failing bas… |
| huggingface#39491 | defect | applied | patch | unavailable | Fix: Skip weight initialization for quantized int8 models | skips missing-key weight initialization for quantized models; compileall and PR-local ruff passed, global style/light validation matched failing baseline |
| huggingface#39468 | defect | applied | patch | unavailable | Fix quantized model dispatch with device_map='auto' | skips accelerate dispatch_model for bitsandbytes quantized models; compileall and PR-local ruff passed, global style/light validation matched failing baseline |
| huggingface#39464 | defect | already_present | none | unavailable | Skipping initialize_weights when model is quantized | the cumulative branch already skips _initialize_missing_keys weight initialization when is_quantized is true from the prior quantized-init port |
| huggingface#39456 | defect | already_present | none | unavailable | Fix quantized model initialization for int8 dtypes | current cumulative branch already skips missing-key initialization for quantized models, covering the PR's modeling_utils fix |
| huggingface#39449 | defect | already_present | none | unavailable | Fix logger warnings in Gemma model test files | tracked Gemma/Gemma2/Gemma3/Gemma3n files no longer contain the logger.warning_once calls changed by the PR; only an existing untracked legacy Flax artifact st… |
| huggingface#39435 | defect | applied | patch | unavailable | Add a unit test for BartModel to compare eager, sdpa on one particular set of inputs | added Bart eager/SDPA mask regression test; compileall and PR-local ruff passed, targeted pytest/light validation unavailable due baseline import-structure bac… |
| huggingface#39309 | defect | merged | merge | unavailable | Fix audio pipeline with torchcodec input | merged with conflict resolution preserving current torchcodec guard and mono-channel handling; compileall passed and PR-local ruff passed, but full checker and… |
| huggingface#39264 | defect | already_present | none | unavailable | Fix: Add version check for timm to support mobilenetv5 models (fixes huggingface#39208) | current TimmWrapper already routes model creation through _create_timm_model_with_error_handling, raising an ImportError that asks users to upgrade timm for un… |
| huggingface#39257 | defect | merged | merge | unavailable | Fix to tuple conversion with config | merged with conflict resolution keeping current helper functions and applying the return_dict=True fix; compileall and PR-local ruff passed, but full checker/l… |
| huggingface#39222 | feature | already_present | none | unavailable | Enable granite 4 hybrid integration tests | current test already has GraniteMoeHybrid slow integration tests enabled against the newer ibm-granite/granite-4.0-h-tiny checkpoint with updated expected logi… |
| huggingface#39211 | defect | already_present | none | unavailable | Add mobilenet_v5 stub implementation to fix "Unknown Model" error | current cumulative branch already addresses mobilenetv5_300m_enc loading via TimmWrapper _create_timm_model_with_error_handling and timm-version upgrade guidan… |
| huggingface#39206 | defect | applied | patch | unavailable | fix: filter None router logits in Qwen3 MoE and handle empty router logits (huggingface#39203) | guarded Qwen3 MoE load-balancing loss against empty router-logit tuples; compileall and PR-local ruff passed, while full checker/light validation remain unavai… |
| huggingface#39183 | feature | applied | patch | unavailable | Add a 'chat' extra | added a chat optional extra for current rich/requests chat CLI dependencies and documented installing it; compileall and setup.py ruff passed, global validatio… |
| huggingface#39150 | feature | already_present | none | unavailable | Efficient Expert Weight Fusion for Moe deepseek v3 | current DeepseekV3MoE already routes through DeepseekV3NaiveMoe/MixtralExperts for fused/vectorized expert execution; the PR older manual weight-stacking imple… |
| huggingface#39140 | feature | already_present | none | unavailable | feat(trainer): emergency checkpointing on crashes & SIGTERM/SIGINT | current Trainer already has opt-in enable_jit_checkpoint signal-based checkpointing with CheckpointManager coverage; PR emergency checkpointing conflicts with … |
| huggingface#39108 | defect | applied | patch | unavailable | Disable static cache on certain MoE models | disabled fullgraph compilation for DeepseekV3 and Dots1 MoE classes; compileall and PR-local ruff passed, full validation remained baseline-unavailable |
| huggingface#39103 | defect | applied | patch | unavailable | Fix audio-related config naming for Gemma3n | direct merge conflicted in Gemma3n files; applied the small naming fix locally; compileall passed but full validation is unavailable because baseline ruff alre… |
| huggingface#39047 | feature | merged | merge | unavailable | RFC: refactor causal lm loss to handle lm_head in loss function | merged with a local conflict resolution in loss_utils for current typing style; compileall passed but full validation is unavailable because baseline ruff alre… |
| huggingface#39037 | defect | merged | merge | unavailable | fix kosmos2 tests | merged with local conflict resolution adapting the is_causal fix to the current Cache/past_key_values API; compileall passed but full validation is unavailable… |
| huggingface#39012 | defect | already_present | none | unavailable | [WIP] Fix DeepseekV3ModelTest::test_torch_compile_for_training | direct merge would drag a large upstream-main merge; the intended DeepSeekV3 change (iterate only matched experts, avoiding token_indices.numel dynamic-shape g… |
| huggingface#39009 | feature | merged | merge | unavailable | Add submodels support check function | merged cleanly; compileall passed but full validation is unavailable because baseline ruff already fails on pre-existing untracked/generated files |
| huggingface#38999 | defect | already_present | none | unavailable | Use deep copies instead of shallow copies for bbox_embed in GroundingDINO decoder (huggingface#37333). | current GroundingDINO already builds separate bbox_embed modules via a comprehension and manages tie keys for the shared/non-shared modes; direct PR conflict a… |
| huggingface#38908 | defect | already_present | none | unavailable | Add support to use config dtype in HybridChunkedCache | current HybridChunkedCache already uses getattr(config, 'torch_dtype', dtype); direct merge conflicts with a moved get_mask_sizes/HybridCache region from the o… |
| huggingface#38888 | defect | merged | merge | unavailable | continue to fix distributed_type from TPU to XLA in LM examples (huggingface#38652) | merge clean; validation unavailable because baseline already fails checkers on pre-existing untracked legacy/generated files and light validation tests_fetcher… |
| huggingface#38886 | feature | merged | merge | unavailable | Allow compile with bnb | merge completed with straightforward conflict resolution in bnb quantizers to keep current weight conversion methods and add PR is_compileable properties; full… |
| huggingface#38884 | defect | merged | merge | unavailable | Llama 4 conversion fix for moe models | merge completed with simple conflict resolution choosing the PR moe_args None check; validation unavailable due baseline checkers/tests_fetcher failures; narro… |

Rejected / not included records (379)

PR Category Status Original PR summary / goal Rejection reason
huggingface#45679 other skipped TST Run fast PEFT tests in normal CI category not configured for this cumulative branch
huggingface#45667 other skipped chore(typing): add ty type checking for 3 pipeline files category not configured for this cumulative branch
huggingface#45666 feature validation_failed Extended n-to-1 kernel fusion via light validation failed in tests_fetcher after merge: invalid relative type-checking imports in src/transformers/integrations/hub_kernels.py produced dependency key src/transformers/integrations/mode…
huggingface#45664 documentation skipped Doc translate to Persian(farsi) category not configured for this cumulative branch
huggingface#45612 documentation skipped [docs] update model cards category not configured for this cumulative branch
huggingface#45608 documentation skipped Python code in model docs category not configured for this cumulative branch
huggingface#45604 feature aborted Agent first cli with skill PR targets upstream/agent-first-cli, not main; direct merge is clean but would also import the absent agent-first-cli base branch, while the PR-only diff depends on missing src/transformers/cli/agent…
huggingface#45569 feature aborted Proper nemotron H and 3 and 2 codebase moved on in cumulative branch: PR restructures Nemotron-H into dense/sparse variants and deletes modular_nemotron_h.py, overlapping the already-merged huggingface#45591 _no_reinit initialization fix in…
huggingface#45550 other skipped Add runner selection for mi325 GPU type category not configured for this cumulative branch
huggingface#45543 other skipped ci: OTEL support category not configured for this cumulative branch
huggingface#45534 feature aborted 🚨 [ALM] Add base model without head codebase moved on: ALM base-model PR conflicts across auto mappings and multiple generated/model modular files where current branch has newer model mappings/classes (e.g. Granite4Vision/ForConditiona…
huggingface#45476 other skipped [Don't merge] Call CI workflow category not configured for this cumulative branch
huggingface#45465 documentation skipped [docs] contributing category not configured for this cumulative branch
huggingface#45462 other skipped chore(sec): added a handful of security checks category not configured for this cumulative branch
huggingface#45453 feature aborted Draft commit codebase moved on: TP loading refactor conflicts deeply with current core_model_loading weight-mapping/direct-load flow and existing tensor_parallel/finegrained_fp8 updates; resolving would require r…
huggingface#45452 other skipped refactor: replace wildcard imports with explicit imports in model init.py files category not configured for this cumulative branch
huggingface#45426 feature aborted Feature/add axk1 codebase moved on: new-model auto registration conflicts with the current auto mapping structure; integrating AXK1 would require adapting generated/config/model mappings to the newer auto-mappings fl…
huggingface#45421 defect aborted Improve nested base_model_prefix handling in weight conversion and loading codebase moved on: nested-prefix changes conflict deeply with the current core_model_loading transform pipeline, which now uses ordered WeightTransform handling rather than the PR's separate renaming…
huggingface#45415 other skipped Adds type checking to src/transformers/*py category not configured for this cumulative branch
huggingface#45401 feature aborted Add support for Voxtral-4B-TTS-2603 to transformers codebase moved on: new-model auto configuration registration conflicts with the current generated/auto-mappings structure; a safe resolution would require regenerating/adapting Voxtral TTS against th…
huggingface#45396 feature aborted Extract dynamic vision/audio tensors into standalone pure functions codebase moved on: the shared vision/audio tensor extraction conflicts across multiple actively changed modular/generated multimodal models, including newer Qwen3.5 MTP code and GLM/Qwen3 VL changes;…
huggingface#45382 feature validation_failed Add AudioGen (AudioCraft) to MusicGen conversion scripts compileall and style passed after mechanical ruff format amend, but light validation timed out in run-light-validation after repeated 240-290s attempts
huggingface#45363 feature aborted n-to-1 kernel fusion via KernelConfig codebase moved on: the kernel fusion series conflicts in utils/kernel_config.py and hub_kernels after newer KernelConfig/hub-kernel changes; the PR branch also merged main, so direct merge would drag…
huggingface#45360 other skipped Replace deprecated huggingface-cli references with hf category not configured for this cumulative branch
huggingface#45355 feature aborted Add universal phone recognition model - PhoneticXeus codebase moved on: new-model auto configuration registration conflicts with the current auto-mapping structure; integrating PhoneticXeus would require regenerating/adapting auto mappings and docs aga…
huggingface#45332 feature aborted Add heterogeneous model support (per-layer config and modeling) codebase moved on: heterogeneous modeling changes overlap current cache layer registry handling and masking API additions; resolving would require reconciling per-layer heterogeneous layer_idx logic …
huggingface#45321 defect validation_failed Remove references to torchao's AffineQuantizedTensor compile and repository checkers passed, but light validation timed out twice with no pytest result after merging TorchAO AffineQuantizedTensor removal
huggingface#45317 defect validation_failed Fix AttributeError in _patch_mistral_regex when fix_mistral_regex=True narrow cherry-pick avoided unrelated PR-head history and compile/checkers passed, but selected auto-tokenizer validation failed because PixtralProcessor requires missing torchvision
huggingface#45296 feature aborted Add GGUF support to Gemma4 (31B & 26B-A4B) text codebase moved on: Gemma4 GGUF tensor processor/config conversion conflicts with newer GGUF support added for Llama4/Qwen/GPT-OSS in modeling_gguf_pytorch_utils and ggml tests; integrating would requ…
huggingface#45267 documentation skipped Add docstring to FFN.forward in DistilBERT category not configured for this cumulative branch
huggingface#45254 other skipped Fix more integration tests for important models category not configured for this cumulative branch
huggingface#45244 other skipped Let's CI go great category not configured for this cumulative branch
huggingface#45213 other skipped DO NOT MERGE - model creation skill category not configured for this cumulative branch
huggingface#45189 feature validation_failed Add doc test CI workflow reusing existing model job infrastructure merge was clean but light validation timed out after compile/checkers passed
huggingface#45186 feature aborted Add new model: Isaac codebase moved on: new-model auto mappings and conversion mapping conflict with cumulative branch additions; resolving a 96-commit generated/model addition would require regenerating mappings and bro…
huggingface#45181 feature aborted Make the cli a top-level package codebase moved on: cumulative branch already changed the CLI entrypoint for agentic CLI support, conflicting with the PR split into a transformers_cli package
huggingface#45176 feature aborted added efficietvitsam model to HF codebase moved on: new EfficientViT-SAM auto mappings conflict with cumulative branch model registry changes; safe resolution would require regenerated auto mappings and full model review
huggingface#45153 feature aborted [FA] Native torch integration codebase moved on: native torch FlashAttention changes conflict with cumulative branch attention-sink/layer_idx handling and newer split_attention_implementation/import-utils availability APIs; resol…
huggingface#45152 documentation skipped [docs] model testing category not configured for this cumulative branch
huggingface#45149 feature aborted DO NOT MERGE adding SAML3-LiteText with a skill, first pass codebase moved on: Sam3 Lite Text files and docs already exist on the cumulative branch and PR also conflicts with generated auto mappings; title is DO NOT MERGE and resolving add/add generated model…
huggingface#45144 feature aborted Add Xiaomi MiMo-V2 codebase moved on: new MiMo-V2 model registration conflicts with cumulative branch docs/model registries and auto mappings; resolving an 85-commit generated model addition would require regeneration …
huggingface#45133 feature validation_failed Add sarvam model merge was clean but ruff reported many non-mechanical undefined names/missing imports in new Sarvam modeling/modular files and configuration all; reset merge
huggingface#45115 feature aborted Refactor/nemotron h inherit granitemoehybrid codebase moved on: Nemotron-H refactor conflicts with current generated/model modular code and auto configuration mappings; the PR replaces local block/cache structure with GraniteMoeHybrid inheritan…
huggingface#45114 documentation skipped fix: lets fix all doctests category not configured for this cumulative branch
huggingface#45113 feature validation_failed Add GDS support for safetensors loading merge conflicts were mechanically resolved and compile/checkers passed, but run-light validation timed out after 300s while baseline light validation passed; reset merge
huggingface#45110 feature aborted Add SAM 3.1 codebase moved on: SAM 3.1 model addition conflicts with current generated model registries/auto mappings and existing SAM3 conversion script processor changes; safe integration would require regener…
huggingface#45101 feature aborted Adding support for Nandi Models codebase moved on: Nandi model addition conflicts with the current generated auto configuration registry; integrating the new model would require regeneration/adaptation against the cumulative branch…
huggingface#45097 feature aborted Add old InternVL2-1B/2B support to the InternVL conversion script huggingface#45092 codebase moved on: InternVL conversion support conflicts across conversion mappings, current auto image/video registries, InternVL processor token handling, and core loading tests; resolving would re…
huggingface#45077 defect aborted fix: pin 50 unpinned actions to commit SHA, extract 1 secret to env var codebase moved on: workflow hardening conflicts across many GitHub Actions files whose current versions already include newer pinned actions/formatting; resolving 13 workflow conflicts would require …
huggingface#45073 feature validation_failed Refactor OwlViT to modular Transformers merge conflict was mechanically resolved and compile/checkers passed, but light validation timed out after 300s while baseline passed; reset merge
huggingface#45067 feature validation_failed feat: trainer resume_from_checkpoint support hub downloads (huggingface#43375) clean merge and compile/checkers passed, but light validation timed out after 300s while baseline passed; reset merge
huggingface#45064 feature aborted refactor: shard checkers codebase moved on: checker sharding conflicts with current Makefile and utils/checkers.py caching/timing/streaming structure; resolving would require CI checker policy review rather than a safe mecha…
huggingface#45060 defect validation_failed Fix PIL backend fallback when torchvision is unavailable clean merge and compile/checkers passed, but light validation timed out after 300s while baseline passed; reset merge
huggingface#45056 defect validation_failed [] needs to be only run on doc merge conflict was mechanically resolved and compile/checkers passed, but light validation timed out after 300s while baseline passed; reset merge
huggingface#45055 defect aborted Save model config in Trainer checkpoints for non-PreTrainedModel models codebase moved on: Trainer save-path patch conflicts in a region that has been substantially reorganized around current gradient/save helpers; safely placing the config save would require reviewing c…
huggingface#45037 documentation skipped add missing colon in custom_attention function signature in attention… category not configured for this cumulative branch
huggingface#45034 defect aborted Pass packed boundary metadata to Qwen3.5 linear-attention fast kernels from data collator codebase moved on: Qwen3.5 linear-attention packed-metadata changes conflict with current gated-delta cache correctness logic in modular/generated model code and overlapping Qwen3.5 tests; resolving …
huggingface#45028 feature aborted TP refactor for FSDP + TP integration codebase moved on: TP/FSDP refactor targets an older fsdp-vs-ddp branch and direct merge drags 65 commits across many model/configuration files plus temporary scripts; conflicts in core_model_loading…
huggingface#45017 defect aborted [WIP][Fix] GLM 5 set apply_rotary_pos_emb to is_neox_style=False && remove F.relu() codebase moved on: WIP GLM5 DSA fix now conflicts with current finegrained FP8 integration and modular/generated GLM MoE DSA implementation; PR also adds new dsa_kernels/FP8 index path beyond the tit…
huggingface#44981 defect validation_failed Trainer: set skip_logits for loss-only eval when liger enabled merge was clean and compile/checkers passed, but light validation timed out after 300s on the Trainer changes while baseline passed; reset merge
huggingface#44979 feature validation_failed Module Fusion API merge was clean and compile/checkers passed, but light validation timed out after 300s while baseline passed; reset merge
huggingface#44974 feature aborted Refactor core_model_loading to support FSDP shard-on-read loading codebase moved on: FSDP shard-on-read core loading refactor is based on the fsdp-vs-ddp workstream and conflicts with current core_model_loading/modeling_utils plus modeling/core-loading tests; resol…
huggingface#44973 defect validation_failed Fix max_seqlen type in vision attention for torch.compile + FA2 merge was clean and compile/checkers passed, but light validation timed out after 300s while baseline passed; reset merge
huggingface#44965 other skipped try category not configured for this cumulative branch
huggingface#44958 defect validation_failed fixed import error with PILImageResampling merged cleanly and mechanical ruff fix was amended; compile/checkers then passed, but light validation timed out after 300s while baseline passed; reset merge
huggingface#44956 feature aborted Add HyperCLOVAX SEED Think 14B codebase moved on: HyperCLOVAX model addition conflicts with current docs toctree, models package init, and generated auto configuration/modeling registries; integrating a new generated model safely …
huggingface#44942 feature aborted Add inference time layer fusion optimisations via PreTrainedModel.from_pretrained(fuse_layers=True) codebase moved on: PR adds an older fuse_layers boolean and fusion_mapping implementation that conflicts add/add with the cumulative branch newer fusion_config/fusion_mapping API and modeling_utils i…
huggingface#44875 feature aborted refactor: improved the cli server module code organization codebase moved on: CLI serve has already been reorganized around split serving utilities, reasoning mode, and continuous-batching request-id handling; this older single-file refactor conflicts across…
huggingface#44872 documentation skipped Fix: Update outdated sampler comment in generation/utils.py category not configured for this cumulative branch
huggingface#44830 feature aborted Add AudioFlamingoNext model codebase moved on: AudioFlamingoNext adds a generated model alias on top of older MusicFlamingo/AudioFlamingo3 modular files and conflicts with current docs toctree, model package init, auto registri…
huggingface#44815 defect aborted Dequant fix codebase moved on: finegrained FP8 dequantization code has since been refactored and accumulated static expert/Mistral4 support; the PR conflicts in finegrained_fp8.py after earlier Mistral4 changes,…
huggingface#44794 feature aborted Refacto GGUF weight conversion codebase moved on: GGUF conversion refactor now conflicts with current finegrained FP8 integration, GGUF PyTorch loader, and modeling_utils loading flow; the PR head is a broad stacked branch and res…
huggingface#44775 documentation skipped [docs] n-d parallelism category not configured for this cumulative branch
huggingface#44772 documentation skipped bitsandbytes: Update links and docs category not configured for this cumulative branch
huggingface#44729 defect aborted Avoid floating point math for ceil operations codebase moved on: broad int_div_ceil replacement spans configuration, trainer, tensor-parallel, modeling, and many image/video processors; current branch has since refactored fast image processors i…
huggingface#44722 feature aborted Refactor gptj output tracing to use standardized decorators codebase moved on: GPT-J output tracing refactor is based on a version with cache_position plumbing, while the cumulative branch includes later cache_position removal and current output/return_dict h…
huggingface#44682 feature aborted transformers serve + llamacpp codebase moved on: serve has since been split into serving handlers/utilities with reasoning and continuous-batching request-id changes, while this older llama.cpp integration rewrites the monolithic…
huggingface#44659 documentation skipped docs: remove outdated use_diff docstring from DistributedConfig.to_js… category not configured for this cumulative branch
huggingface#44646 documentation skipped Fix typo: seperate -> separate category not configured for this cumulative branch
huggingface#44642 documentation skipped Clarify that causal LM labels are shifted internally category not configured for this cumulative branch
huggingface#44601 feature aborted [Distributed] Add PP support natively codebase moved on: native pipeline-parallel loading patch targets the older core_model_loading flow where every loaded tensor goes through a mapping object; current branch has since added the direct_…
huggingface#44553 feature aborted [] Refactor FA CB kwargs codebase moved on: FA continuous-batching refactor conflicts in flash_paged.py where current branch already has a newer paged decode fast path, lazy kernel import flow, sliding-window handling, and v…
huggingface#44550 documentation skipped Improve clarity and grammar in Auto Classes documentation category not configured for this cumulative branch

@evalstate
Copy link
Copy Markdown
Owner Author

Feature + defect flow status table (part 3/4)

| huggingface#44547 | documentation | skipped | Fix position_ids docstring in modeling_flash_attention_utils.py | category not configured for this cumulative branch |
| huggingface#44517 | feature | validation_failed | Add qwen3 tts | merge was clean but ruff reported undefined all export Qwen3TTSTokenizerMultiCodebookCode2WavConfig in qwen3_tts_tokenizer_multi_codebook configuration; reverted top merge |
| huggingface#44495 | feature | aborted | [Gradient Ckpting] Remove unnecessary attribute definitions | codebase moved on: broad gradient-checkpointing cleanup touches hundreds of generated and modular model files; current branch has many intervening model refactors/new models, producing conflicts acro… |
| huggingface#44467 | feature | aborted | Placeholder tokens update | codebase moved on: placeholder-token update spans tokenizer conversion, tokenizer auto mappings, base/tokenizers backends, and model tokenizers; current branch has intervening tokenizer error-handlin… |
| huggingface#44445 | feature | aborted | Adding support for GraniteDoclingHybrid | codebase moved on: GraniteDoclingHybrid adds a new model using older auto-mapping generated files; current branch has since changed configuration_auto to use generated auto_mappings plus non-standard… |
| huggingface#44420 | documentation | skipped | [docs] distributed training | category not configured for this cumulative branch |
| huggingface#44407 | documentation | skipped | docs: add energy efficiency considerations to bitsandbytes quantization guide | category not configured for this cumulative branch |
| huggingface#44394 | feature | aborted | 🚨🚧 FeatureExtractor → AudioProcessor | codebase moved on: PR is a repository-wide FeatureExtractor to AudioProcessor migration touching core preprocessing/image/audio APIs and generated per-model image/audio processors; current cumulative… |
| huggingface#44375 | feature | aborted | Add RF-DETR | codebase moved on: RF-DETR adds a full new model with generated auto mappings and LW-DETR loss changes; current branch has intervening generated auto-mapping/model-registration changes, causing confl… |
| huggingface#44314 | feature | aborted | add HyperClovaX Vision | codebase moved on: HyperClovaX Vision is a large new multimodal model plus Qwen2.5-VL changes and generated auto mappings; current branch has intervening conversion mapping, auto-mapping, video/image… |
| huggingface#44298 | defect | aborted | Auto detect wrong mapping models | codebase moved on: tokenizer backend auto-detection and SentencePiece/Gemma/T5 tokenizer reconstruction have been substantially refactored on the cumulative branch; the PR overlaps current precompile… |
| huggingface#44264 | feature | aborted | [Moe] Enable aux loss automatically when in training + coef is not 0 | codebase moved on: the MoE auxiliary-loss change touches many generated and modular MoE modeling/configuration files; current branch has intervening MoE model refactors and generated-file changes, pr… |
| huggingface#44252 | feature | aborted | Timm unification continued | codebase moved on: timm unification spans backbone utilities, core/model loading, auto mappings, timm_backbone/timm_wrapper configs and modeling, plus many vision model configs/tests; current cumulat… |
| huggingface#44178 | feature | aborted | Add xcodec2 model | codebase moved on: the new XCodec2 model PR was based on the older auto-configuration mapping layout and conflicts with the cumulative branch where auto mappings have been split into auto_mappings/SP… |
| huggingface#44161 | defect | aborted | Refactor LongT5 to use @capture_outputs and @can_return_tuple decorators for unified output handling (Fixes huggingface#43979) | codebase moved on: the LongT5 output-capturing refactor conflicts throughout modeling_longt5.py with the current cumulative branch; it rewrites forward signatures, output plumbing, and tuple/dict ret… |
| huggingface#44154 | defect | aborted | Refactored vits to match standardized output collection interface | codebase moved on: the VITS output-capturing refactor conflicts in modeling_vits.py and uses an older/non-current capture-output API/import shape, so resolving safely would require reworking the deco… |
| huggingface#44129 | defect | aborted | Refactor SpeechT5 output tracing to standardized output capture | codebase moved on: the SpeechT5 output-capturing migration conflicts broadly across modeling_speecht5.py, including encoder/decoder forward signatures, output recorder plumbing, and multiple task hea… |
| huggingface#44123 | defect | aborted | Avoid device sync in training loss accumulation | codebase moved on: Trainer training-loop structure has been substantially refactored on the cumulative branch, with on-device loss accumulation already partially present; the PR conflicts across chec… |
| huggingface#44116 | defect | aborted | [WIP] [Flaubert] Refactor output tracing to decorator-based interface | codebase moved on: the Flaubert decorator migration conflicts throughout modeling_flaubert.py, including imports, attention output contracts, base model forward plumbing, and multiple wrapper heads; … |
| huggingface#44114 | defect | aborted | Migrate wav2vec2, wav2vec2_conformer, and wav2vec2_bert to standardized output collection decorators | codebase moved on: the wav2vec2-family output-capture migration spans many copied/generated and modular audio model files; current cumulative branch has intervening output-capturing, modular-generati… |
| huggingface#44101 | feature | aborted | [XLM] Refactor output tracing to align with capture_outputs standardized architecture | codebase moved on: the XLM output-capture migration conflicts in both XLM and copied Flaubert modeling files; the cumulative branch already contains newer output-capturing/Flaubert refactors, so safe… |
| huggingface#44098 | feature | aborted | [ViLT] Refactor output handling to align with standardized patterns | codebase moved on: ViLT output handling now conflicts in package init exports and modeling_vilt.py with the cumulative branch output-capture changes; resolving would need a current-tree ViLT migratio… |
| huggingface#44086 | feature | validation_failed | [MGP-STR] Refactor output tracing to use capture_outputs/can_return_tuple decorators | merge was clean but compileall failed with SyntaxError: unmatched ")" in src/transformers/models/mgp_str/modeling_mgp_str.py |
| huggingface#44085 | feature | aborted | Refactor RemBERT to use output tracing decorators | codebase moved on: the PR title says RemBERT but the diff edits GPT-J; modeling_gptj.py already has intervening output-tracing changes on the cumulative branch, causing content conflicts that should … |
| huggingface#44083 | feature | aborted | FSDP2 native support in transformers | codebase moved on: native FSDP2 is a broad distributed/modeling feature touching generation, accelerate/FSDP/MoE integrations, modeling utilities, and common tests; the merge conflicts in tests/test_… |
| huggingface#44076 | feature | aborted | Refectored modeling_imagegpt.py to enable hooks to capture_outputs | codebase moved on: ImageGPT modeling has diverged under the output-capture refactor work; the PR conflicts in modeling_imagegpt.py, and applying it safely requires redoing the capture_outputs/can_ret… |
| huggingface#44074 | feature | aborted | [TextNet] Refactor output tracing using capture_outputs decorator | codebase moved on: TextNet output tracing has diverged in the current cumulative branch; modeling_textnet.py conflicts while tests also changed, so safe integration needs a current-tree TextNet captu… |
| huggingface#44073 | feature | aborted | [VisualBert] Refactor output tracing using capture_outputs and can_return_tuple decorators | codebase moved on: VisualBert output-capture decorator migration conflicts in modeling_visual_bert.py with current output handling changes; safely landing it would require reapplying the migration to… |
| huggingface#44072 | feature | aborted | refactor efficientnet output tracing with @capture_outputs and @can_r… | codebase moved on: EfficientNet output tracing changes conflict in modeling_efficientnet.py after intervening capture_outputs/model output refactors; safe resolution requires a fresh migration agains… |
| huggingface#44071 | feature | aborted | [Refactor] Migrate MPT to standardized output tracing decorators | codebase moved on: MPT standardized output tracing migration conflicts in modeling_mpt.py with current cumulative branch output-handling changes; landing it would require a fresh current-tree decorat… |
| huggingface#44068 | feature | aborted | Refactor GPT-Neo to use and decorators | codebase moved on: GPT-Neo output-capture refactor conflicts throughout modeling_gpt_neo.py with current cache_position and output-handling signatures; safely landing it would require reapplying the … |
| huggingface#44066 | feature | aborted | Refactor GPT-J to use standardized output tracing (huggingface#43979) | codebase moved on: GPT-J/CodeGen standardized output tracing conflicts with the current cache_position-aware forward signatures and output plumbing; safe integration would require reapplying the deco… |
| huggingface#44054 | feature | aborted | Flash mla interface | codebase moved on: the experimental Flash MLA/GLM MoE DSA integration conflicts across hub kernel registration, modeling utilities, FP8 integration, GLM MoE DSA config/model/modular files, and tests;… |
| huggingface#44018 | feature | aborted | Refactor GPT-Neo output tracing to use capture_outputs/can_return_tuple | codebase moved on: GPT-Neo output-tracing refactor conflicts throughout modeling_gpt_neo.py with current cache_position-removal and TransformersKwargs/output-capture signatures; safely landing it wou… |
| huggingface#44015 | feature | aborted | Refactor GPT2-based models to standardized output collection interface | codebase moved on: the GPT2/GPTBigCode/DecisionTransformer decorator migration conflicts across current merge_with_config_defaults, OutputRecorder regex layer names, cache_position-removal changes, T… |
| huggingface#44007 | feature | aborted | [ResNet] Refactor output tracing to decorator-based interface | codebase moved on: ResNet output tracing had already been partially migrated on the cumulative branch, while this older PR also changes RegNet/RT-DETR ResNet copied paths and conflicts with current c… |
| huggingface#44004 | feature | aborted | refactor output tracing for codegen | codebase moved on: CodeGen output-tracing refactor conflicts across attention/model/LM forward signatures with current TransformersKwargs and cache_position-removal changes; resolving would require r… |
| huggingface#44003 | feature | aborted | refactor output tracing in mamba | codebase moved on: Mamba/Falcon-Mamba decorator migration conflicts with current cache_position-removal and output handling in imports, backbone forwards, and LM heads; a safe landing would need a fr… |
| huggingface#43996 | feature | aborted | Refactor FNet and CVT output tracing | codebase moved on: FNet/CvT output-tracing refactor conflicts in many model, pretraining, masked-LM, NSP, classification, and QA forward paths with the cumulative branch partially migrated can_return… |
| huggingface#43995 | feature | aborted | Refactoring falcon model to match standardized output collection interface | codebase moved on: Falcon output-tracing refactor conflicts with cumulative branch changes that removed cache_position plumbing and partially migrated output flag handling; resolving safely would req… |
| huggingface#43973 | feature | aborted | Add lfm2.5 audio | codebase moved on: LFM2 audio adds a new model against the older auto-mapping layout, while the cumulative branch now sources configuration mappings from generated auto_mappings plus current generate… |
| huggingface#43924 | feature | aborted | [] More old mask APIs | codebase moved on: broad old attention-mask API migration conflicts across many model files with cumulative output/signature and attention-mask refactors; resolving would require a fresh coordinated … |
| huggingface#43888 | feature | aborted | Support for BharatGen's Param2MoE model architecture | codebase moved on: Param2MoE adds a new model against the older inline configuration_auto mapping, while the cumulative branch uses generated auto_mappings/current auto-generation; safe integration w… |
| huggingface#43785 | defect | aborted | Fix FSDP_CPU_RAM_EFFICIENT_LOADING (huggingface#43749) | codebase moved on: FSDP CPU RAM loading changes conflict in integrations/fsdp.py with current PEFT/FSDP helper additions, and the PR also applies its core_model_loading early-return inside the conver… |
| huggingface#43757 | defect | aborted | Avoid hard failure for gpt-oss GGUF architecture by falling back to g… | codebase moved on: modeling_gguf_pytorch_utils.py now maps gpt-oss to the dedicated gpt_oss architecture and includes GptOssTensorProcessor support, so the PR fallback to gpt-neox conflicts with and … |
| huggingface#43751 | other | skipped | Fix ruff warnings | category not configured for this cumulative branch |
| huggingface#43743 | feature | aborted | Modular playground | codebase moved on: Persimmon modeling has diverged with current cache/output-capturing and kwargs patterns plus an existing modular_persimmon.py, making the playground branch conflicts broad and not … |
| huggingface#43665 | other | skipped | fix | category not configured for this cumulative branch |
| huggingface#43656 | defect | aborted | Fix TypeAdapter NameError in transformers CLI | codebase moved on: cli/serve.py has been substantially rewritten and no longer uses TypeAdapter or the old annotation-heavy serving implementation, so the PR annotations/import fix is obsolete and co… |
| huggingface#43649 | feature | aborted | Check new failures reporting 5 | codebase moved on: CI self-scheduled and notification/check_bad_commit workflows have diverged, including current actor gating and expanded machine/slice matrix conflicting with the PR new-failure re… |
| huggingface#43532 | other | skipped | [do not merge] Show diff | category not configured for this cumulative branch |
| huggingface#43488 | other | skipped | [don't merge] bad format to check repo bot | category not configured for this cumulative branch |
| huggingface#43448 | feature | aborted | Add Molmo | codebase moved on: Molmo adds generated docs/model registries and auto mappings that now conflict with current generated registry structure after Molmo2 and many later model additions; resolving woul… |
| huggingface#43446 | feature | aborted | [typings] Automatically type decorator return types as tuple \| X | codebase moved on: decorator return typing rewrite touches generated modeling/modular files and repo check wiring across the repository; 62 conflicts across core CI/checker files and many model files… |
| huggingface#43424 | other | skipped | Add test to ensure executorch exportability with dynamic shapes | category not configured for this cumulative branch |
| huggingface#43340 | other | skipped | Claude code skills for transformers-api | category not configured for this cumulative branch |
| huggingface#43333 | other | skipped | Fix typo: interupted -> interrupted | category not configured for this cumulative branch |
| huggingface#43310 | other | skipped | Replace regex with re | category not configured for this cumulative branch |
| huggingface#43297 | feature | aborted | [Feat] Reduces redundant tokenization of tags to accelerate Qwen3VL. | codebase moved on: PR implementation rewrites the older Qwen3VLProcessor.call path and the PR head also contains a merge-from-main commit that would drag broad unrelated history; current qwen3_vl… |
| huggingface#43271 | other | skipped | Fix typo: necesary → necessary | category not configured for this cumulative branch |
| huggingface#43267 | documentation | skipped | Add auto_docstring decorator to Sam3ImageProcessorFast | category not configured for this cumulative branch |
| huggingface#43265 | feature | aborted | Adding Omnilingual ASR models | codebase moved on: Omnilingual ASR draft adds a new model against the older explicit auto-mapping layout, while current auto configuration imports generated auto_mappings and requires the modular/new… |
| huggingface#43249 | feature | aborted | [WIP] Processor moves to in | codebase moved on: PR changes processor device handling on an older tree where fast image processor files were tracked, but this cumulative branch has those generated fast image processor files as pr… |
| huggingface#43246 | defect | aborted | GptOss slow tests | codebase moved on: current GptOss slow tests already use Mxfp4Config(dequantize=not quantized), device-qualified expected-output keys, XPU/CPU skips, and expanded distributed/training cases; the PR e… |
| huggingface#43238 | defect | aborted | Fix ObjectDetectionPipeline batch processing bug huggingface#31356 | codebase moved on: PR head includes a checked-in virtualenv and its object_detection.py patch targets an older postprocess signature, duplicating the method and dropping current top_k/labels/box_form… |
| huggingface#43213 | feature | aborted | feat: allow output_hidden_states and output_attensions to record outputs of specific layers | codebase moved on: PR modifies the old check_model_inputs/output-recording hook path in utils.generic, but current generic.py has replaced that flow with merge_with_config_defaults and a deprecated c… |
| huggingface#43192 | feature | aborted | [Trackio] support trackio gpu logging | codebase moved on: TrackioCallback has since gained static Space freezing, bucket_id handling, changed project/space setup, and docs now state report_to defaults to none; the PR GPU-logging/min-versi… |
| huggingface#43149 | defect | aborted | docs(serving): add minimal Python client examples for chat completion… | codebase moved on: PR mixes serving docs with an older monolithic Serve implementation and tests; current serve code has been refactored into cli/serving handlers and docs moved under docs/source/en/… |
| huggingface#43139 | feature | validation_failed | [perf] optimize whisper GPU performance | repository style/lint failed after merge: PR introduces undefined Optional/Union annotations in Whisper feature extractor plus whitespace issues |
| huggingface#43104 | documentation | skipped | docs: clarify tokenizer decoder behavior in v5 (huggingface#43066) | category not configured for this cumulative branch |
| huggingface#43102 | documentation | skipped | Add CPU vs GPU performance comparison example | category not configured for this cumulative branch |
| huggingface#43077 | other | skipped | compileable=>compilable | category not configured for this cumulative branch |
| huggingface#43063 | documentation | skipped | Improve documentation for SegFormer image processor | category not configured for this cumulative branch |
| huggingface#43036 | documentation | skipped | Docs: fix grammar in Pipeline section | category not configured for this cumulative branch |
| huggingface#43020 | feature | aborted | Add mimo v2 flash | codebase moved on: MiMo-V2-Flash new-model PR targets the older explicit auto-mapping files and provides only a modular model skeleton; current auto configuration imports generated auto_mappings and … |
| huggingface#42982 | feature | aborted | Add HumanV: decoder-only causal LM | codebase moved on: HumanV new-model PR targets older auto configuration/modeling mappings and adds broad auto-class entries while current configuration_auto imports generated auto_mappings; safely in… |
| huggingface#42978 | feature | aborted | Add ViT NEPA | codebase moved on: ViT NEPA new-model PR targets older docs/loss registries and explicit auto configuration/modeling mappings; current branch uses generated auto_mappings and has diverged docs/loss r… |
| huggingface#42976 | other | skipped | Upgrade GitHub Actions to latest versions | category not configured for this cumulative branch |
| huggingface#42975 | other | skipped | Upgrade GitHub Actions for Node 24 compatibility | category not configured for this cumulative branch |
| huggingface#42944 | feature | aborted | [Quantization] From config Quantization for FP8 | codebase moved on: FP8 from_config support conflicts in finegrained_fp8 integration and modeling_utils quantization loading paths that have since diverged on the cumulative branch; resolving would re… |
| huggingface#42919 | feature | aborted | [WIP] Video support in vLLM backend | codebase moved on: vLLM video-token support touches many multimodal processors and Llava OneVision modeling; current branch has diverged processor signatures and modular Qwen2.5-VL code, causing broa… |
| huggingface#42908 | defect | aborted | Fix gguf tokenizers | codebase moved on: GGUF tokenizer fix conflicts in generation utilities, auto tokenization mapping, and tokenizer base loading code that have diverged on the cumulative branch; resolving safely would… |
| huggingface#42900 | defect | aborted | Fix: Set clean_up_tokenization_spaces | codebase moved on: PR changes the tokenizer clean_up_tokenization_spaces default to True, but current branch has deliberately changed the default to False and added a BPE-specific escape hatch; apply… |
| huggingface#42887 | feature | aborted | [Quantization] [Compressed Tensors] Support Transforms, Fix Tests | codebase moved on: compressed-tensors transform support conflicts in the quantizer, optional import detection, and compressed-tensors integration tests; current quantization config/import logic has d… |
| huggingface#42876 | documentation | skipped | Document tensor parallelism configuration with Trainer | category not configured for this cumulative branch |
| huggingface#42829 | feature | aborted | [WIP] End-to-end exportable pipelines (object detection) | codebase moved on: exportable object-detection pipeline changes overlap with fast image processor files that were removed/regenerated on the cumulative branch and with divergent image_transforms, RT-… |
| huggingface#42824 | defect | aborted | Fix torch only support for fast Processors | codebase moved on: MLX fast-processor fix depends on image_processing_utils_fast.py and related fast processor tests that are no longer tracked in the current cumulative branch, while feature_extract… |
| huggingface#42816 | defect | aborted | validate tokenizer components | codebase moved on: tokenizer component validation patch conflicts with the now-reworked tokenization_utils_tokenizers import set and convert_to_native_format flow, which now has optimized tokenizer.j… |
| huggingface#42785 | documentation | skipped | Fixing wrong information in Mimi Docs | category not configured for this cumulative branch |
| huggingface#42781 | feature | aborted | Add VibeVoice Realtime | codebase moved on: VibeVoice Realtime depends on a large model-family addition but the cumulative branch already contains overlapping VibeVoice/VibeVoice acoustic tokenizer docs, generated/modeling f… |
| huggingface#42767 | feature | aborted | fix: add mapping of deepseek_v32 model type | codebase moved on: DeepSeek v3.2 model-family addition conflicts with current generated/auto-mapping structure and newer DeepSeek model registrations (deepseek_ocr2/deepseek_v4); configuration_auto.p… |
| huggingface#42744 | other | skipped | [FP8 Devstral 24B] Repro PR | category not configured for this cumulative branch |
| huggingface#42742 | other | skipped | Remove redundant else in activations.py | category not configured for this cumulative branch |
| huggingface#42706 | other | skipped | Nit parakeet | category not configured for this cumulative branch |
| huggingface#42668 | defect | aborted | More robust processor from pretrained | codebase moved on: processor robustness changes overlap with a newer ProcessorMixin subprocessor/modality refactor already on the cumulative branch (_pop_prebuilt_subprocessors, modality aliases, TYP… |
| huggingface#42665 | feature | aborted | Some optimizations for offloading | codebase moved on: offloading optimization conflicts with a substantially refactored core_model_loading path that now uses LoadStateDictInfo, WeightMapping/direct_param_loads, device-mesh helpers, an… |
| huggingface#42655 | feature | aborted | New Feature: Enabling Speculative Decoding with Batch Size > 1 (If draft and target model share tokenizer) | codebase moved on: batched speculative decoding changes conflict with current assisted-generation APIs, candidate generator typing/signatures, assistant_generation_config handling, cache/position-id … |
| huggingface#42631 | defect | aborted | Make GraniteMoeHybridModel compatible with torch.export | codebase moved on: the GraniteMoeHybrid torch.export fix targets a newer/different forward signature using cache_position, check_model_inputs, HybridMambaAttentionDynamicCache imports, and is_torchdy… |
| huggingface#42588 | documentation | skipped | Document the /v1/models endpoint | category not configured for this cumulative branch |
| huggingface#42572 | documentation | skipped | docs: add doctest for SqueezeBERT | category not configured for this cumulative branch |
| huggingface#42527 | documentation | skipped | Added doctests for SwiftFormer model | category not configured for this cumulative branch |
| huggingface#42521 | defect | aborted | Fix FSDP2 defaulting to version 1 in TrainingArguments; add dynamic plugin param passthrough | codebase moved on: FSDP argument handling was substantially refactored on the cumulative branch to normalize legacy fsdp into fsdp_config, default to fsdp_config['version']=2, and relocate FSDP tests… |
| huggingface#42467 | defect | aborted | Fixes StaticCache Crashes | codebase moved on: StaticCache internals on the cumulative branch now use cumulative_length tensors, separate key/value head dimensions, crop support, and a different StaticSlidingWindowLayer update … |
| huggingface#42461 | feature | aborted | Refactor RMSNorm implementations to use torch.nn.functional.rms_norm | codebase moved on: the RMSNorm refactor touches 65 model files and conflicts broadly with current model implementations that have diverged since the PR; although each conflict is conceptually replaci… |
| huggingface#42453 | feature | aborted | Add SDPA and FlashAttention support to T5 | codebase moved on: T5/MT5 modeling files on the cumulative branch diverge substantially from the PR's older attention-interface refactor, producing many conflicts throughout imports, attention classe… |
| huggingface#42437 | other | skipped | One tok typing | category not configured for this cumulative branch |
| huggingface#42430 | other | skipped | 🚨 Clean up image-text-to-text pipeline | category not configured for this cumulative branch |
| huggingface#42415 | other | skipped | initial clean | category not configured for this cumulative branch |
| huggingface#42413 | feature | aborted | Add chatterbox support | codebase moved on: the new Chatterbox/S3Gen/S3Tokenizer model addition targets the older auto-configuration and repo-checker layout; current cumulative branch uses generated auto_mappings plus newer … |
| huggingface#42412 | other | skipped | Replace Optional and Union typing with | in examples | category not configured for this cumulative branch |
| huggingface#42385 | defect | aborted | Fix weight tying logic between tied_weights_keys and tie_word_embeddings | codebase moved on: PR head contains many upstream/main merges and direct merge conflicted in core_model_loading plus dataclass-migrated model config files; intended tied-weight changes would need man… |
| huggingface#42345 | feature | aborted | GPT-OSS Flash Attention and memory-efficient attention via Native PyTorch SDPA | codebase moved on: GPT-OSS attention now uses ALL_ATTENTION_FUNCTIONS.get_interface and kernelized rotary path; PR adds a bespoke sdpa path against the older API and conflicts in generated and modula… |
| huggingface#42292 | documentation | skipped | docs: clarify recommended usage of max_new_tokens in generate() | category not configured for this cumulative branch |
| huggingface#42277 | documentation | skipped | doc(kernels): update kernels integration documentation | category not configured for this cumulative branch |
| huggingface#42244 | defect | aborted | [core] Fix torchao loading | codebase moved on: PR touches cross-cutting torchao/core model-loading code across hundreds of generated modeling files; direct merge produced 0 conflicts including modeling_utils/core_model_loading/… |
| huggingface#42229 | feature | aborted | Add openpangu_moe model | codebase moved on: new model registration conflicts with current generated auto mapping/toctree structure; configuration_auto.py has been refactored from full OrderedDict definitions to generated map… |
| huggingface#42210 | feature | aborted | [WIP] started adding support for evo2 | codebase moved on: Evo2 new-model registration conflicts with current toctree and generated auto mapping/tokenization layout; resolving would require regenerating mappings and adapting the WIP model … |
| huggingface#42166 | feature | aborted | add internvl_flash model | codebase moved on: InternVL Flash new-model registration conflicts with current generated auto configuration mapping and check_repo model ignore rules; resolving requires current model-generation con… |
| huggingface#42130 | defect | aborted | Try refactoring logging to make type checks pass | codebase moved on: logging.py has since gained lazy tqdm/huggingface_hub imports and a TransformersLogger typing protocol while this PR rewrites logger typing via setLoggerClass; resolving would requ… |
| huggingface#42127 | defect | aborted | Standardize conv len function for audio models | codebase moved on: audio convolution-length standardization now conflicts in multiple generated and modular audio model files; resolving would require propagating/regenerating model copies rather tha… |
| huggingface#42124 | documentation | skipped | 📚 docs(qwen3): add comprehensive usage examples and model details | category not configured for this cumulative branch |
| huggingface#42112 | feature | aborted | Add max_thinking_tokens for reasoning models (issue huggingface#42111) | codebase moved on: max_thinking_tokens generation feature conflicts with current GenerationConfig and generation tests; integrating would require adapting the new logits processor/config semantics to… |
| huggingface#42039 | other | skipped | [WIP] 🚨 clean xcodec 🧼 | category not configured for this cumulative branch |
| huggingface#42000 | documentation | skipped | Fix Mixtral: Docstring uses consistent 'top_k_index' and 'top_k_weights' in MixtralSparseMoeBlock | category not configured for this cumulative branch |
| huggingface#41992 | feature | aborted | [PoC] HF exporters | codebase moved on: broad exporter framework touches many modeling, modular, utility, and documentation files and now conflicts in multiple generated/modular multimodal model files; resolving would re… |
| huggingface#41980 | defect | aborted | Correct type hint in config models | codebase moved on: config rope/type-hint annotations have diverged across many generated and modular model configuration files; resolving would require broad modular regeneration and current typing r… |
| huggingface#41977 | feature | aborted | Add Phi3.5 Vision Model | codebase moved on: new Phi3.5 Vision model registration conflicts with current generated auto configuration/image-processing/modeling mappings; integrating would require regenerating and adapting the… |
| huggingface#41967 | feature | aborted | feat: RoPE-related typing improvements | codebase moved on: RoPE utility typing has changed in current modeling_rope_utils, conflicting with the PR typing refactor; resolution would need reviewing current RoPE parameter schemas rather than … |
| huggingface#41899 | feature | aborted | Testing checkpoint limit changes from PR huggingface#37196 | codebase moved on: checkpoint save-limit feature conflicts with current Trainer, TrainingArguments, and trainer tests; resolving would require reworking new save limit semantics against the current t… |
| huggingface#41886 | feature | aborted | ADD FG-CLIP2 | codebase moved on: FG-CLIP2 adds a full model and collides with current auto-mapping registries; resolving would require regenerating/validating model integration against the current generated auto f… |
| huggingface#41882 | feature | aborted | Support fdma for models with attention bias | codebase moved on: FDMA integration conflicts in attention docs, modeling utilities, testing utilities, and utility exports; resolution would need adapting to current attention/backend registration A… |
| huggingface#41880 | documentation | skipped | Indonesian Language Support for ReadMe | category not configured for this cumulative branch |
| huggingface#41823 | feature | aborted | Lfm2-VL vllm | codebase moved on: PR head is an old v4.57 release branch with ~1489 changed files and direct merge conflicts across CI, docs, generation, auto mappings, image processors, and the current LFM2-VL mod… |
| huggingface#41807 | documentation | skipped | git commit -m "Fix: corrected outdated documentation link in README.md" | category not configured for this cumulative branch |
| huggingface#41800 | documentation | skipped | Increasing clarity | category not configured for this cumulative branch |
| huggingface#41794 | other | skipped | Enable flake8-pie rules | category not configured for this cumulative branch |
| huggingface#41754 | feature | aborted | Add pytree registration for static cache | codebase moved on: StaticCache pytree registration PR was written against older cache layer names and older ExecuTorch export helpers; current code has StaticSlidingWindowLayer/current DynamicCache r… |
| huggingface#41733 | documentation | skipped | transformers CLI documentation issue | category not configured for this cumulative branch |
| huggingface#41721 | defect | aborted | Fix Qwen3-VL Processor flattening multi-image batches (fix huggingface#41709) | codebase moved on: PR edits the old monolithic Qwen3VLProcessor.call, but current branch has moved Qwen3-VL processing to ProcessorMixin helpers such as replace_image_token/replace_video_token wi… |
| huggingface#41710 | documentation | skipped | 🌐 [i18n-KO] Translated main_classes/backbones.md to Korean | category not configured for this cumulative branch |
| huggingface#41693 | feature | aborted | 🚨 Refactor ViT to updated standards | codebase moved on: PR is a broad vision-model standards refactor touching dozens of generated/model files; current cumulative branch already has newer conversion_mapping/test_modeling_common record-o… |
| huggingface#41654 | defect | aborted | Improve LLaMA tokenizer error when vocab is missing: suggest installi… | codebase moved on: PR modifies the old full slow LlamaTokenizer.get_spm_processor implementation, but current v5 branch has reduced tokenization_llama.py to the TokenizersBackend shim with no local S… |
| huggingface#41611 | documentation | skipped | Docs add custom loss example | category not configured for this cumulative branch |
| huggingface#41609 | defect | aborted | Fix gemma gguf tokenizer | codebase moved on: direct merge pulls stacked docs/processor changes and conflicts in tokenization_auto.py and processing_utils.py; the actual Gemma GGUF commit targets the old AutoTokenizer flow and… |
| huggingface#41597 | documentation | skipped | Standardize RoBERTa model card following issue huggingface#36979 | category not configured for this cumulative branch |
| huggingface#41584 | defect | aborted | Add clear error message for missing SentencePiece model in get_spm_processor (fix huggingface#41553) | codebase moved on: PR changes the old slow LlamaTokenizer.get_spm_processor SentencePiece implementation, but current v5 tokenization_llama.py is a TokenizersBackend/BPE shim with no get_spm_processo… |
| huggingface#41565 | documentation | skipped | 🌐 [i18n-KO] Updated perf_train_gpu_many.md | category not configured for this cumulative branch |
| huggingface#41557 | other | skipped | Update modeling_llama4.py | category not configured for this cumulative branch |
| huggingface#41531 | documentation | skipped | 🌐 [i18n-KO] Translated video_processor.md to Korean | category not configured for this cumulative branch |
| huggingface#41527 | documentation | skipped | 🌐 [i18n-KO] Translated selecting.md to Korean | category not configured for this cumulative branch |
| huggingface#41491 | feature | aborted | Add skip_unnecessary_grad_clip to TrainingArguments for optimized gradient clipping | codebase moved on: PR head includes a large merge from upstream and direct merge conflicts in the refactored Trainer/TrainingArguments training loop; the actual gradient-clipping change targets the o… |
| huggingface#45679 | other | skipped | TST Run fast PEFT tests in normal CI | category not configured for this cumulative branch |
| huggingface#41488 | other | skipped | Create 1 | category not configured for this cumulative branch |
| huggingface#41485 | defect | aborted | Fix smolvlm2 dtype mismatch final | codebase moved on: SmolVLM modular/modeling dtype handling already diverged with a newer self.dtype/StopIteration fallback, conflicting with the PR's inputs_embeds.dtype change in generated and modul… |
| huggingface#41419 | feature | aborted | First QAT for Finegrained FP8 | codebase moved on: QAT FP8 changes conflict in central lazy imports, import_utils optional dependency helpers, and finegrained_fp8 integration now changed substantially; safe resolution would require… |
| huggingface#41406 | other | skipped | 🚨 [v5] Remove deprecated cache classes | category not configured for this cumulative branch |
| huggingface#41362 | documentation | skipped | Added Hacktoberfest banner image to README.md | category not configured for this cumulative branch |
| huggingface#41333 | feature | aborted | Add DeepseekVLV2 Model | codebase moved on: DeepSeek-VL-V2 adds new model files but conflicts in multiple central auto-mapping registries whose ordering/contents have diverged; resolving would require regenerating and valida… |
| huggingface#41330 | other | skipped | Unskip and fix offline mode tests, use HF_HUB_OFFLINE, make hermetic | category not configured for this cumulative branch |
| huggingface#41329 | defect | aborted | Fix GIL=0 segfault and Add GIL=0 compat for regex paths | codebase moved on: PR wraps many slow tokenizer and conversion regex usages for Python GIL=0, but current v5 branch has replaced several tokenizers with TokenizersBackend shims and deleted deprecated… |
| huggingface#41315 | feature | aborted | [model deprecations] Define new version-based model deprecation/deletions with user warnings/exceptions | codebase moved on: deprecation infrastructure targets old PretrainedConfig and monolithic configuration_auto mappings, while current v5 branch has renamed/reworked PreTrainedConfig as a dataclass-sty… |
| huggingface#41312 | other | skipped | tests: unskip and fix offline mode test using HF_HUB_OFFLINE + hermetic cache warmup | category not configured for this cumulative branch |
| huggingface#41299 | feature | aborted | copied changes from soghomon-b:add-eval-step-limit-31561 | codebase moved on: eval-step-limit patch targets older monolithic Trainer, TrainingArguments documentation/fields, and in-file trainer tests, while current v5 branch has substantially reorganized tra… |
| huggingface#41273 | defect | aborted | fix(quantization): Skip weight initialization for quantized models | codebase moved on: quantized-loading fix conflicts across modeling_utils and many quantizer implementations; current v5 quantizer base/loading flow has different param_needs_quantization/preprocess/p… |
| huggingface#41272 | feature | aborted | feat: Add HRM Model | codebase moved on: HRM model PR is based on the older inline auto-mapping layout and docstring ignore list, while the cumulative v5 branch now imports generated auto mappings; resolving the model reg… |
| huggingface#41254 | documentation | skipped | docs: Add Hacktoberfest banner, Contributing and License sections to README | category not configured for this cumulative branch |
| huggingface#41251 | feature | aborted | Add deepseek 3.2 exp | codebase moved on: DeepSeek 3.2 experimental model conflicts with current docs toctree, FP8 integration code, package lazy imports, and generated auto configuration/model/tokenizer mappings; current … |
| huggingface#41215 | defect | aborted | Fix CLIP memory leak causing 600-800MB accumulation per batch | codebase moved on: CLIP/AIMv2/MetaCLIP2 get
*_features now use the v5 BaseModelOutputWithPooling/can_return_tuple API and return projected output objects, while the PR changes these methods back to r… |
| huggingface#41202 | feature | aborted | [WIP] standardize audio kwargs | codebase moved on: audio-kwargs standardization conflicts in Whisper generation signatures/body and would require renaming input_features/audio_spectrogram through the current v5 generation flow, whi… |
| huggingface#41162 | documentation | skipped | Add Sinhala (සිංහල) translation of README | category not configured for this cumulative branch |
| huggingface#41160 | documentation | skipped | [docs] update tips syntax | category not configured for this cumulative branch |
| huggingface#41159 | feature | aborted | Support setting total_train_batch_size. | codebase moved on: total_train_batch_size patch targets older monolithic Trainer and TrainingArguments/DeepSpeed sections, while the cumulative v5 branch has reorganized optimizer/training code and s… |
| huggingface#41097 | defect | aborted | Delay and probably avoid unnecessary graph breaks in _upad_input of modeling_flash_attention_utils.py | codebase moved on: flash-attention utilities now include FA3/FA4 kernel fallback, tracing helpers, and a local indexing path that diverges substantially from the PR's older _get_unpad_data/_upad_inpu… |
| huggingface#41095 | feature | aborted | Add LLaVA-OneVision-1.5 model and related configurations | codebase moved on: LLaVA-OneVision-1.5 adds a new model but conflicts in central auto-mapping registries and check_repo ignore lists that have been reorganized/generated in the cumulative v5 branch; … |
| huggingface#41053 | feature | aborted | Qwen3 moe | codebase moved on: PR rewrites Qwen3-MoE expert/router/decoder internals with experimental env-gated grouped GEMM code, while current cumulative branch uses modular inheritance from Qwen2Moe experts/… |
| huggingface#41040 | feature | aborted | Add Keye vl 8b 1.5 | codebase moved on: new Keye VL model conflicts in generated auto-mapping registries and check_repo ignore lists, and includes fast image processor artifacts from an older tree; safe integration would… |
| huggingface#41037 | other | skipped | Tests: Apertus integration tests | category not configured for this cumulative branch |
| huggingface#41035 | documentation | skipped | docs: update speech recognition examples to use modern Common Voice d… | category not configured for this cumulative branch |
| huggingface#41024 | feature | aborted | Deprecate max_size in ConditionalDetrImageProcessor with warning | codebase moved on: PR targets older generated ConditionalDetr image processor/test layout; current branch generates the processor from modular_conditional_detr and has v5 image-processing tests cover… |
| huggingface#41022 | documentation | skipped | 🌐 [i18n-KO] Translated backbones.md to Korean | category not configured for this cumulative branch |
| huggingface#41021 | documentation | skipped | 🌐 [i18n-KO] Translated video_processors.md to Korean | category not configured for this cumulative branch |
| huggingface#41009 | feature | aborted | Add Lexa-Delta model support | codebase moved on: Lexa-Delta adds a new model against older explicit auto-mapping registries; current cumulative branch uses reorganized/generated v5 auto mappings and tokenizer mapping shapes, so r… |
| huggingface#40962 | feature | aborted | perceptron: Isaac-0.1 implementation | codebase moved on: Isaac new-model PR targets older explicit auto-mapping registries; current cumulative branch uses generated/update-based mappings and has diverged image processor mapping/check-rep… |
| huggingface#40888 | documentation | skipped | DOC Fix help for chat and serve commands | category not configured for this cumulative branch |
| huggingface#40887 | feature | aborted | Refactor output handling in generate for cleaner decoding methods | codebase moved on: generation utils now include MTP decoding, async stopping-criteria handling, flash-attention compile checks, and assisted-generation defaults from newer cumulative changes; the PRs… |
| huggingface#40877 | documentation | skipped | Bug huggingface#40833: Fix for kv_offset calculation for mixed padding | category not configured for this cumulative branch |
| huggingface#40871 | feature | aborted | Refactor benchmark utils: add type hints, GPU metrics helper, and con… | codebase moved on: PR modifies benchmark_v2/benchmark_framework.py, but that file is deleted/untracked in the current cumulative v5 branch and benchmark infrastructure has moved; applying the old ben… |
| huggingface#40870 | feature | aborted | Reduce vRAM usage during generation by allowing to transfer logits to CPU | codebase moved on: offload-logits feature patches older generation output accumulation and GenerationConfig defaults, while current cumulative branch has dataclass-style default None handling, async … |
| huggingface#45703 | other | skipped | chore(typing): add ty type checking for 10 utility files | category not configured for this cumulative branch |
| huggingface#40820 | feature | aborted | Add models to benchmarks | codebase moved on: PR targets older benchmark_v2/benchmark_framework.py plus benchmark_v2/benches files; current cumulative branch has benchmark infrastructure under benchmark_v2/framework and untrac… |
| huggingface#40738 | documentation | skipped | Docs: Clarify rjieba installation for RoFormerTokenizer | category not configured for this cumulative branch |
| huggingface#40736 | documentation | skipped | 🌐 [i18n-KO] Translated jan.md to Korean | category not configured for this cumulative branch |
| huggingface#40728 | feature | aborted | feat(serve): add OTEL | codebase moved on: PR modifies legacy src/transformers/commands/serving.py for transformers serve OTEL, but commands package is no longer tracked in this v5-era cumulative tree and untracked legacy c… |
| huggingface#40714 | documentation | skipped | Remove TF and Flax from README | category not configured for this cumulative branch |
| huggingface#40670 | feature | aborted | Add ability to run Gemma 2 models without post layer norm | codebase moved on: PR edits Gemma2/T5Gemma config init code, but current tree has strict dataclass-style configs and modular-generated model files; resolving requires porting the feature into the… |
| huggingface#40648 | other | skipped | Bump torch from 2.7.1 to 2.8.0 in /examples/flax/vision | category not configured for this cumulative branch |
| huggingface#40640 | defect | aborted | Resume training by trained samples to avoid elastic job loss or over-reading of data. | codebase moved on: Trainer training loop/state has been substantially refactored around _init_training_state/_run_epoch; PR changes older inline resume logic and large test sections, causing broad co… |
| huggingface#40637 | feature | aborted | [WIP]Add openpangu_dense model | codebase moved on: direct merge would drag unrelated upstream history/PR huggingface#40948, and cherry-picking OpenPangu model commits conflicts with current generated auto mappings/model registry layout; porti… |
| huggingface#40633 | feature | aborted | Add support for Custom Accelerate Instance in Trainer | codebase moved on: PR adds a custom Accelerator parameter to the old monolithic Trainer init and also includes broad lint/style churn; current Trainer has been refactored into a typed staged initiali… |
| huggingface#40546 | feature | aborted | Implement VibeVoice | codebase moved on: PR is a large new-model branch with 220 commits and old generated auto-mapping layout; current configuration_auto.py is generated around CONFIG_MAPPING_NAMES.update while the PR re… |
| huggingface#40524 | documentation | skipped | Use begin_of_sequence token in all sliding windows for correct model behaviour | category not configured for this cumulative branch |
| huggingface#40505 | feature | aborted | Refactor Siglip-like models | codebase moved on: SigLIP-like models have diverged in current modular/generated implementations; refactor conflicts across AimV2, Idefics2/3 and SigLIP modeling/modular files, requiring semantic reg… |
| huggingface#40493 | defect | aborted | Update dtypes to suit colab bf16 -> fp16 -> fp32. | codebase moved on: dtype resolution/loading code has been refactored into the current _get_dtype/from_pretrained flow; PR touches older modeling_utils import and dtype fallback locations and conflict… |
| huggingface#40473 | other | skipped | [tests] Unskip DeBERTaV2 tokenizer parity tests; re-enable fast/slow checks | category not configured for this cumulative branch |
| huggingface#40471 | documentation | skipped | DOC: Standardize CodeGen model card (issue huggingface#36979) | category not configured for this cumulative branch |
| huggingface#40465 | documentation | skipped | 🌐 [i18n-KO] Translated tools.md to Korean | category not configured for this cumulative branch |
| huggingface#40464 | documentation | skipped | 🌐 [i18n-KO] Translated agents.md to Korean | category not configured for this cumulative branch |
| huggingface#40448 | feature | aborted | [model] Support MiniCPM-V 4.5 | codebase moved on: MiniCPM-V 4.5 PR targets the older explicit auto mapping tables, while current v5 auto files are generated and use post-parse CONFIG/MODEL/PROCESSOR mapping updates; adding the new… |
| huggingface#40446 | feature | aborted | Add convert_segmentation_map_to_binary_masks_sorted function for hand… | codebase moved on: Mask2Former image processors have been regenerated from modular/torchvision backends and the PR adds a numpy helper/tests against the older generated slow processor; resolving safe… |
| huggingface#40425 | defect | aborted | Fix check_quantized_param method when param_value is a safetensors slice | codebase moved on: the old check_quantized_param API targeted by the PR has been replaced by param_needs_quantization and centralized loading/dtype handling; the PR conflicts across five quantizers a… |
| huggingface#40404 | documentation | skipped | update model card for gpt-j | category not configured for this cumulative branch |
| huggingface#40403 | feature | aborted | Customizable Logit Warping Strategies for Generation huggingface#40010 | codebase moved on: generation configuration and logits processing have diverged substantially (continuous batching support flags, nullable GenerationConfig defaults, new sampling controls and safety … |
| huggingface#40400 | documentation | skipped | fixed redundant words in readme.md | category not configured for this cumulative branch |
| huggingface#40395 | documentation | skipped | 🌐 [i18n-KO] Updated text_generation.md | category not configured for this cumulative branch |
| huggingface#40390 | documentation | skipped | Fix typo: 'lenght' to 'length' | category not configured for this cumulative branch |
| huggingface#40328 | feature | aborted | [rfc] Prototype to make torch.compile work with DynamicCache | codebase moved on: DynamicCache and generation compile gating have been refactored in current v5 (new cache layer constructors, cache_params lookup, ContinuousMixin/generation flow, XPU/neuron compil… |
| huggingface#40299 | feature | aborted | Remove deprecated max_size parameter from ConditionalDetr image processors | codebase moved on: DETR-family image processors have been split/regenerated into torchvision/PIL backends in current v5, while the PR removes max_size across the older generated processors and confli… |
| huggingface#40286 | feature | aborted | Add MOSS-TTSD with XY-Tokenizer | codebase moved on: adding MOSS-TTSD/XY-Tokenizer targets the older explicit auto mapping tables and repo check allowlists, while current v5 auto/model files are generated/post-processed and model add… |
| huggingface#40265 | other | skipped | Enable PLW and PLE rules | category not configured for this cumulative branch |
| huggingface#40225 | documentation | skipped | docs: clarify decoder_input_ids vs decoder_inputs_embeds usage (huggingface#39542) | category not configured for this cumulative branch |
| huggingface#40209 | feature | aborted | Add fast image processor for ViViT | codebase moved on: current v5 image processor auto mappings use backend-specific torchvision/PIL dictionaries and ViViT tests were refactored, while the PR adds an older single fast image processor f… |
| huggingface#40180 | feature | aborted | Enable native mxfp4 training support for GPT-OSS models | codebase moved on: current mxfp4 integration, quantizer, quantization config, and tests have diverged from the PR, including existing mxfp4 fixes and refactored tests; the PR adds a large autograd/ha… |
| huggingface#40177 | defect | aborted | Revert text_positions in Qwen25VL | codebase moved on: Qwen2.5-VL generation now computes packed text+vision position_ids through prepare_position_ids_for_generation and modular/generated code, whereas the PR removes an older prepare… |
| huggingface#40171 | other | skipped | Rename to CamelCase | category not configured for this cumulative branch |
| huggingface#40155 | documentation | skipped | 🌐 [i18n-KO] Translated t5.md to Korean | category not configured for this cumulative branch |
| huggingface#40149 | feature | aborted | Implemented fast image processor for VitPose | codebase moved on: fast image processor infrastructure was removed/refactored in current v5 (image_processing_utils_fast.py is deleted, auto image processor mappings are backend-specific torchvision/… |
| huggingface#40131 | documentation | skipped | add missing Arabic translations | category not configured for this cumulative branch |
| huggingface#40102 | documentation | skipped | 🌐 [i18n-KO] Translated to Korean | category not configured for this cumulative branch |
| huggingface#40064 | documentation | skipped | 🌐 [i18n-KO] Translated videomae.md to Korean | category not configured for this cumulative branch |
| huggingface#40061 | documentation | skipped | 🌐 [i18n-KO] Translated vitdet.md to Korean | category not configured for this cumulative branch |
| huggingface#40047 | documentation | skipped | Update wavlm.md to match new model card template | category not configured for this cumulative branch |
| huggingface#39987 | feature | aborted | Add a VGGT(Visual Geometry Grounded Transformer) model compatible with huggingface transfromers | codebase moved on: PR adds VGGT against older static auto-mapping files, while current branch uses generated/updated auto registry structure; direct merge conflicts in model registries and resolving … |
| huggingface#39962 | defect | aborted | Use torch._check instead of a test to make the model Gemma3 exportable | codebase moved on: PR touches many multimodal modeling and tests to replace shape assertions with torch._check, but current branch has substantially refactored generated/modular model files and the d… |
| huggingface#39931 | feature | aborted | Registers StaticCache serialization functions for torch.export.export | codebase moved on: StaticCache and executorch integration have changed since the PR; direct merge conflicts in cache_utils, integrations/executorch.py, and cache tests. Current cache code already con… |
| huggingface#39922 | documentation | skipped | 🌐 [i18n-KO] Translated attention_interface.md to Korean | category not configured for this cumulative branch |
| huggingface#39920 | documentation | skipped | 🌐 [i18n-KO] Updated ko/perf_train_special.md | category not configured for this cumulative branch |
| huggingface#39917 | documentation | skipped | 🌐 [i18n-KO] Updated ko/perf_train_cpu.md | category not configured for this cumulative branch |
| huggingface#39901 | documentation | skipped | 🌐 [i18n-KO] Translated fp_quant to Korean | category not configured for this cumulative branch |
| huggingface#39899 | feature | aborted | [model] Support MiniCPM-V 4.0 | codebase moved on: PR adds MiniCPM-V 4.0 against older static auto registries; direct merge conflicts in all auto mapping files and the model package registry, and the PR also carries obsolete genera… |
| huggingface#39886 | documentation | skipped | 🌐 [i18n-KO] Translated perf_train_gaudi.md to Korean | category not configured for this cumulative branch |
| huggingface#39859 | feature | aborted | WIP: Initial support for bnb 4bit on any nn.Parameter | codebase moved on: PR implements bnb target_parameters through older create_quantized_param/check_quantized_param paths, while current branch has refactored 4-bit loading into param_needs_quantizatio… |
| huggingface#39831 | other | skipped | refactor(modeling_llama): make RotaryEmbedding default path explicit | category not configured for this cumulative branch |
| huggingface#39807 | documentation | skipped | 🌐 [i18n-KO] Translated bamba.md to Korean | category not configured for this cumulative branch |
| huggingface#39796 | feature | aborted | [pipelines] text-to-audio pipeline standardization | codebase moved on: text-to-audio standardization touches pipeline behavior plus CSM/Dia/Qwen2.5-Omni model internals and broad tests; current branch has diverged in generation/modeling/pipeline/test … |
| huggingface#39792 | defect | aborted | Served models handle with nested content | codebase moved on: PR targets the legacy transformers serve command/test paths, but in the current tracked tree src/transformers/commands/serving.py and tests/commands/test_serving.py are not tracked… |
| huggingface#39772 | defect | aborted | Fix missing initializations for models created in 2022 | codebase moved on: PR tries to add missing initialization tests and _init_weights assignments across 2022-era models, but current cumulative branch has diverged in almost every touched model/test fil… |
| huggingface#39760 | feature | aborted | [Draft] Add Llasa TTS family of models | codebase moved on: Llasa is a draft new-model addition against older auto-mapping/check_repo and docs TOC layout. Direct merge conflicts in auto configuration registration, generated docs TOC, and re… |
| huggingface#39756 | defect | aborted | Fix rope_deltas corruption in Qwen2.5VL during CFG generation | codebase moved on: PR targets older Qwen2/Qwen2.5-VL and GLM4V generation/rope_deltas code, while current branch has a redesigned 3D position-id preparation and expanded multimodal generation helpers… |
| huggingface#39751 | documentation | skipped | 🌐 [i18n-KO] Translated text-to-speech.md to Korean | category not configured for this cumulative branch |
| huggingface#39722 | feature | aborted | [Feat] Adding Intern-S1 | codebase moved on: Intern-S1 is a broad new-model addition against older auto mappings, processor/video processor registration, docs TOC, and check_repo expectations. Current branch conflicts in all … |
| huggingface#39718 | documentation | skipped | Fix SigLIP2 documentation model/processor mismatch | category not configured for this cumulative branch |
| huggingface#39708 | documentation | skipped | 🌐[i18n-bn] Introduce Bengali version of Transformers documentation | category not configured for this cumulative branch |
| huggingface#39632 | documentation | skipped | fix dead NVIDIA link | category not configured for this cumulative branch |
| huggingface#39631 | feature | aborted | [serve] Add speech-to-text | codebase moved on: the serving command/docs files targeted by the PR are currently untracked in this cumulative v5-era worktree and would be overwritten by a direct merge; the PR is a large 47-commit… |
| huggingface#39588 | feature | validation_failed | WIP, reference modeling | merge was clean, but PR-local ruff found many new style errors plus an undefined MultiModalProjector symbol in the added reference_vlm model |
| huggingface#39575 | documentation | skipped | 🌐 [i18n-KO] Translated vitpose.md to Korean | category not configured for this cumulative branch |
| huggingface#39563 | documentation | skipped | 🌐 [i18n-KO] Translated vision-encoder-decoder.md to Korean | category not configured for this cumulative branch |
| huggingface#39559 | documentation | skipped | 🌐 [i18n-KO] Translated main_classes/deepspeed.md to Korean | category not configured for this cumulative branch |

@evalstate
Copy link
Copy Markdown
Owner Author

Feature + defect flow status table (part 4/4)

| huggingface#39544 | documentation | skipped | 🌐 [i18n-KO] Translated feature_extractors.md to Korea | category not configured for this cumulative branch |
| huggingface#39541 | feature | aborted | Add Muon optimizer implementation and integration | codebase moved on: Muon optimizer integration conflicts across the current optimizer registry, Trainer optimizer factory path, TrainingArguments optimizer enum, import-utils optional backend plumbing… |
| huggingface#39534 | feature | aborted | Add Beit3 model | codebase moved on: the BEiT3 new-model PR conflicts in generated auto-configuration and image-processing mappings, and porting the 2.8k-line non-modular model would require reconciling current genera… |
| huggingface#39517 | documentation | skipped | 🌐 [i18n-KO] Translated compressed_tensor.md to Korean | category not configured for this cumulative branch |
| huggingface#39480 | feature | aborted | Add model arcinstitute state | codebase moved on: new state/state-transition model conflicts in generated auto configuration/modeling registries, and the added non-modular model files use older style/import conventions; safely lan… |
| huggingface#39466 | documentation | skipped | README: Update Bert Japanese model card | category not configured for this cumulative branch |
| huggingface#39403 | feature | aborted | Add Vocos model | codebase moved on: large Vocos/Vocos-EnCodec model integration conflicts across generated auto registries, processing_utils, config-attribute checks, docs TOC, and audio utilities; safely landing the… |
| huggingface#39357 | documentation | skipped | Update docstring for glm4v | category not configured for this cumulative branch |
| huggingface#39353 | defect | aborted | fix colpali mapping | codebase moved on: PR adds ColPali to the old VLMS checkpoint-conversion path and a class mapping, but current modeling_utils removed that VLMS conversion logic in favor of newer load-state/weight-ma… |
| huggingface#39303 | documentation | skipped | Fix critical typos in code example | category not configured for this cumulative branch |
| huggingface#39297 | defect | aborted | Fix bug with deepspeed and accelerator args in training_args.py | codebase moved on: PR narrows deepspeed and accelerator_config annotations to Optional[str] on an older TrainingArguments layout, while current code intentionally supports dict | str | None with _V… |
| huggingface#39293 | feature | aborted | Add T5LA models | codebase moved on: new T5LA model integration conflicts across generated auto mappings, docs TOC, model package exports, checker rules, and a deleted fx utility; landing it would require regenerating… |
| huggingface#39251 | defect | aborted | Fix slow test_moshika_greedy_unconditional_fp16 | codebase moved on: PR depends on an older cache_utils/generation/Moshi implementation, while the cumulative branch has a newer cache layer architecture and Moshi prepare_inputs_for_generation overrid… |
| huggingface#39236 | feature | aborted | added moment_p sampling | codebase moved on: moment-p sampling touches generation exports, configuration validation, logits processors, docs, and tests that now conflict with current generation APIs/registries; integrating th… |
| huggingface#39212 | documentation | skipped | Add Ukrainian translation of README.md | category not configured for this cumulative branch |
| huggingface#39209 | other | skipped | Standardize FSMT class naming: PretrainedFSMTModel → PreTrainedFSMTModel | category not configured for this cumulative branch |
| huggingface#39109 | defect | aborted | Fix: rename 'eval_strategy' to 'evaluation_strategy' in TrainingArgum… | codebase moved on: TrainingArguments in current v5 intentionally uses eval_strategy throughout docs, dataclass fields, post-init validation, and helper methods; the PR renames it back to evaluation_s… |
| huggingface#39084 | feature | aborted | Refactor gemma3n | codebase moved on: current Gemma3n is generated from strict modular/config code and already has conflicting audio naming updates; the PR refactors configuration, audio encoder placement, residual fie… |
| huggingface#39046 | defect | aborted | Fix deprecated max_size parameter handling in DETR image processors | codebase moved on: DETR-family image processors in current tree have been rewritten around ImageProcessorKwargs/SizeDict helpers, while the PR edits older processor internals and even adds backup/ad-… |
| huggingface#38991 | feature | aborted | Remove return_dict kwarg from all the models | codebase moved on: PR performs a sweeping return_dict signature/removal refactor across many model files, but current v5 model files have diverged/generated structure; direct merge conflicts immediat… |
| huggingface#38988 | defect | aborted | Check docstring inside modular files as well | codebase moved on: PR combines checker changes with widespread modular/model docstring rewrites from an older generated state; current tree has deleted/moved fast image processor and args_doc artifac… |
| huggingface#38962 | other | skipped | Update test_candidate_generator.py | category not configured for this cumulative branch |
| huggingface#38959 | documentation | skipped | Updated the model card for wav2vec2-phoneme | category not configured for this cumulative branch |
| huggingface#38958 | documentation | skipped | Updated model card for wav2vec2-conformer | category not configured for this cumulative branch |
| huggingface#38957 | documentation | skipped | Update wav2vec2-bert model card | category not configured for this cumulative branch |
| huggingface#38956 | documentation | skipped | Updating model card for wav2vec2 | category not configured for this cumulative branch |
| huggingface#38955 | documentation | skipped | docs: Musicgen melody model card | category not configured for this cumulative branch |
| huggingface#38926 | documentation | skipped | Clarify Python and framework version support in installation.md | category not configured for this cumulative branch |
| huggingface#38923 | feature | aborted | Remove deprecated max_size support from YOLOS image processor | codebase moved on: current YOLOS image processor is generated from modular_yolos and the inherited DETR v5 image processor kwargs path; the PR edits older generated init/preprocess/from_dict code… |
| huggingface#38893 | defect | aborted | Fix/deprecate max size conditional detr | codebase moved on: Conditional DETR image processors have been split/refactored into torch and PIL v5 processors generated from current image-processing APIs; the PR edits older max_size/from_dict/pr… |
| huggingface#38877 | documentation | skipped | DOC: Clarify attention_mask usage in BertModel forward method | category not configured for this cumulative branch |
| huggingface#38861 | feature | aborted | Add SamImageProcessorFast with 4x performance improvement | codebase moved on: current v5 SAM image processing uses the new TorchvisionBackend/PilBackend split and auto image processor mappings are generated/dict-based; the PR adds an older BaseImageProcessor… |
| huggingface#38859 | feature | aborted | Add MobileViT fast image processor | codebase moved on: current MobileViT image processing is already rewritten for the v5 TorchvisionBackend/PilBackend split and dynamic import structure; the PR adds an older BaseImageProcessorFast imp… |
| huggingface#38839 | other | skipped | [DO NOT MERGE] Testing saftensors 0.6.0 | category not configured for this cumulative branch |
| huggingface#38810 | feature | aborted | Add kwargs support in WhisperForConditionalGeneration | codebase moved on: Whisper forward signatures now use v5 type annotations, Cache, can_return_tuple, and kwargs forwarding; the PR is based on the older Optional/return_dict/head_mask API and its loss… |
| huggingface#38805 | feature | aborted | Add Dust3R | codebase moved on: adding Dust3R conflicts with current auto-generation/modular model structure and v5 image processor backend; the PR modifies fully generated auto mapping files, adds an older fast … |
| huggingface#38793 | documentation | skipped | Fix Typos in Comments and Improve Clarity | category not configured for this cumulative branch |
| huggingface#38786 | documentation | skipped | Provide clearer instructions on how to specify target language. | category not configured for this cumulative branch |

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants